GeneJepa: A Predictive World Model of the Transcriptome

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We introduce G ene J epa , a self-supervised foundation model that learns a predictive world model of single-cell transcriptomes. Based on the Joint-Embedding Predictive Architecture, G ene J epa predicts latent representations of masked gene sets from visible context, a shift away from reconstructing noisy expression values and toward world-model style inference over cellular state. To realize this at scale, a Perceiver encoder handles variable gene sets at fixed cost, and a tokenizer jointly represents gene identity and continuous expression using Fourier features. Trained on the Tahoe-100M atlas, G ene J epa learns general representations that transfer across tissues and datasets. On downstream tasks, including drug response and perturbation prediction, it surpasses strong baselines and enables test-time scaling by progressively enlarging the cross-attention over the gene set, trading a small read cost for higher accuracy at inference. G ene J epa moves toward foundation models that reason over gene–gene relations, enabling applications in annotation, prediction, and in-silico discovery.

Article activity feed