Cellular morphology emerges from polygenic, distributed transcriptional variation

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Log in to save this article

Abstract

Height and most disease risk are known polygenic traits: characteristics governed by multiple genes at different loci instead of a select few. Though we are beginning to understand how genetic variation impacts cell morphology, whether such an analogous polygenic architecture operates at the cellular level, where morphology integrates cytoskeletal organization, organelle positioning, and metabolic state, has yet to be systematically tested. Here, we demonstrate that cellular morphology behaves as a polygenic trait by integrating multimodal modeling, perturbation profiling, and population-scale genetic variation. A shared latent-space autoencoder trained on four large-scale perturbation datasets predicts morphology from gene expression and generalizes without retraining to matched RNA-seq and Cell Painting profiles from 100 genetically diverse iPSC donors. The model predicted 17 morphological features (R² > 0.6, permutation FDR q < 0.05), enriched for spatial organelle distribution and cytoskeletal architecture. Predictive performance does not arise from dominant gene–phenotype relationships: individual genes contribute modestly, and marginal gene–morphology correlations are uniformly weak, revealing a distributed regulatory architecture. Despite this polygenicity, CRISPR perturbation data from the JUMP consortium validates specific model-prioritized genes, such as the cytoskeletal regulator TIAM1 , membrane trafficking factor RAB31 , and mitochondrial-associated membrane transporter ABCC5 , as molecular anchors whose disruption produces feature-specific morphological shifts. Transcriptome-wide association analyses identify correlational variant–gene–morphology chains linking cis-regulatory variation through mitochondrial metabolism ( PDHX ) and iron transport ( SLC11A2 ) to cellular architecture. These results establish cellular morphology as a polygenic systems phenotype, extending the omnigenic framework to the cellular level and providing a biological basis for interpreting cross-modal prediction in functional genomics.

Article activity feed

  1. Gene expression and morphology inputs were independently mapped to a shared latent573space of dimension 150,

    Did you assess how much information was required to encode the phenotypic representation or the expression representations? I assume you are using single precision numbers. 150 32bit latent space is likely overlarge for the representations you care about. How are you controlling for overfitting?

  2. A shared latent space autoencoder framework was used, adapted from the cross-modal568architecture introduced by Yang et al.13. The model consists of modality-specific encoders569and decoders for gene expression and morphology, which project inputs into a shared570latent representation.

    Are you familiar with: https://thestacks.org/publications/result-g-p-atlas
    It is an extremely similar architecture but has regularization in the form of de-noising and a short training round to align the latent representations to allow use of whatever activation you want.