CoxFormer enables spatial omics inference with multimodal generative modeling
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Gene co-expression provides a principled basis for transcriptome-wide gene–gene relationships, yet the available high-quality estimates typically cover less than half of the genome. In the current biotechnological landscape, spatial omics faces complementary measurement constraints: targeted in situ platforms restrict gene coverage, whereas sequencing-based assays lack cellular resolution. When extended to the entire transcriptome, co-expression relationships may help to address these limitations by enabling the generation of unassayed gene expression data at a subcellular resolution. Here, we present CoxFormer (Co-expression pre-trained Transformer). This universal gene-embedding framework uniquely integrates literature-derived gene knowledge with data-driven co-expression networks from bulk tissues and large-scale single-cell atlases to model transcriptome-wide gene–gene relationships. CoxFormer learns robust 512-dimensional embeddings for 32,016 human genes, capturing their functional and regulatory structure. It also serves as a generative prior that is deployed through a flexible framework for transcriptome-wide spatial inference under multimodal contexts. We demonstrate that this generative framework unifies four key spatial applications beyond measured genes: histology-based imputation, gene activity prediction for chromatin accessibility, subcellular super-resolution inference for unassayed genes, and the detection of pathological tissue regions. Ultimately, CoxFormer provides a versatile foundation for multimodal spatial omics analysis, leading to whole-transcriptome insights beyond the constraints of limited gene panels.