A transcriptomics-native foundation model for universal cell representation and virtual cell synthesis

Xiaohui Jiang
Jichun Xie

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Current single-cell foundation models rely on language-model architectures that ignore transcriptomic data distributions, often underperforming specialized methods. We introduce xVERSE, a transcriptomics-native foundation model coupling batch-invariant representation learning with the probabilistic generation of expression profiles. xVERSE outperforms the leading foundation and batch-effect correction methods in representation learning by 17.9% and 11.4% , respectively, successfully preserving biological heterogeneity while diminishing batch effects. Furthermore, xVERSE surpasses the second-best spatial imputation method by 34.3% and uniquely synthesizes virtual cells indistinguishable from biological data (AUROC ≈ 0.5 ). As a powerful data-augmentation engine, xVERSE utilizes these high-fidelity virtual cells to enable accurate clustering and marker detection in tiny datasets—resolving rare cell types with as few as four cells—while improving the generalizability of cross-modality predictions across diverse pathological states. These results establish xVERSE as a transformative framework unlocking analytical capabilities beyond conventional models.

Version published to 10.64898/2026.04.12.718016 on bioRxiv
Apr 14, 2026

Benchmarking single-cell foundation models for real-world RNA-seq data integration

This article has 9 authors:
1. Siyu Han
2. Tamas Sztanka-Toth
3. Enes Senel
4. Ahmed Elnaggar
5. Jaymala Patel
6. Tommaso Mansi
7. Denis Smirnov
8. Joel Greshock
9. Alex Javidi
This article has no evaluationsLatest version Apr 21, 2026
scConcept enables concept-level exploration of single-cell transcriptomic data

This article has 2 authors:
1. Hegang Chen
2. Yue Li
This article has no evaluationsLatest version Apr 24, 2026
SPEAR: Predicting Gene Expression from Single-Cell Chromatin Accessibility

This article has 2 authors:
1. Thussenthan Walter-Angelo
2. Yasin Uzun
This article has no evaluationsLatest version Apr 14, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Benchmarking single-cell foundation models for real-world RNA-seq data integration

scConcept enables concept-level exploration of single-cell transcriptomic data

SPEAR: Predicting Gene Expression from Single-Cell Chromatin Accessibility