Unlocking biological insight from single-cell data with an interpretable dual-stream foundation model

Honglie Guo
Qinghang Cui
Xiang Zhang
Chaowei Chen
Weihua Zheng
Changfeng Cai
Xinyi Wang
Shunfang Wang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Deep learning foundation models are revolutionizing single-cell biology, yet learning holistic and discriminative representations from complex, high-dimensional data remains a central challenge. Although Transformer-based single-cell language models have shown significant progress, they typically rely on a single input-encoding scheme, a practice that results in the loss of critical gene expression information and hinders the effective learning of global cellular representations. To address these challenges, we introduce scDMC, an innovative dual-stream contrastive pre-training framework designed to synergistically optimize information fidelity at both the gene and cell levels. Pre-trained on only 2 million cells far fewer than the datasets used by mainstream models, scDMC sets a new state-of-the-art in multiple benchmark tasks, including cell annotation, clustering, and data integration. More importantly, we demonstrate that scDMC can uncover functional gene modules, infer cell-type-specific regulatory networks in a data-driven manner, and exhibits a high degree of biological interpretability. This work demonstrates an efficient pre-training approach that paves the way for the next generation of powerful and interpretable single-cell foundation models, promising to accelerate the pace of biological discovery.

Version published to 10.1101/2025.09.05.674596 on bioRxiv
Sep 11, 2025

A Diffusion-Based Autoencoder for Learning Patient-Level Representations from Single-Cell Data

This article has 5 authors:
1. Rebecca Boiarsky
2. Johann Wenckstern
3. Nicholas J. Haradhvala
4. Gad Getz
5. David Sontag
This article has no evaluationsLatest version Aug 25, 2025
CellPatch: A Flexible and Efficient Framework for Single-Cell Foundation Model Empowered by Heuristic Gene Patching

This article has 15 authors:
1. Hanwen Zhu
2. Yushun Yuan
3. Jiyuan Yang
4. Kangwen Cai
5. Nana Wei
6. Senxin Zhang
7. Lu Wang
8. Wen-Jie Jiang
9. YuanChen Sun
10. An Liu
11. Futing Lai
12. Yu-Juan Wang
13. Zeyu Ma
14. Xiaoqi Zheng
15. Hua-Jun Wu
This article has no evaluationsLatest version Aug 6, 2025
Interpretable deep generative ensemble learning for single-cell omics with Hydra

This article has 7 authors:
1. Manoj M Wagle
2. Chunlei Liu
3. Zunpeng Liu
4. Yongheng Wang
5. Manolis Kellis
6. Ellis Patrick
7. Pengyi Yang
This article has no evaluationsLatest version Aug 21, 2025

Listed in

Abstract

Article activity feed

Related articles

A Diffusion-Based Autoencoder for Learning Patient-Level Representations from Single-Cell Data

CellPatch: A Flexible and Efficient Framework for Single-Cell Foundation Model Empowered by Heuristic Gene Patching

Interpretable deep generative ensemble learning for single-cell omics with Hydra