RegFormer: A Single-Cell Foundation Model Powered by Gene Regulatory Hierarchies
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Single-cell RNA sequencing (scRNA-seq) has significantly advanced our understanding of cellular diversity and the molecular mechanisms underlying biological processes. However, existing computational models often struggle to incorporate essential biological knowledge, handle sparse and noisy data, and scale effectively across large datasets. To address these challenges, we introduce RegFormer, a novel foundation model specifically designed for scRNA-seq analysis. RegFormer integrates hierarchical relationships from gene regulatory networks (GRNs) through an innovative architecture based on Mamba Blocks, enabling more effective modeling of gene interactions and cellular states. Pretrained on a vast dataset of 22 million human cells and comprising approximately 50 million parameters, RegFormer employs dual embeddings to separately capture gene expression levels and gene identities. This approach enhances interpretability by aligning gene expression data with regulatory hierarchies, offering more precise biological insights. Extensive evaluations demonstrate that RegFormer outperforms existing state-of-the-art models, such as scGPT, Geneformer, scFoundation, and scBERT, across a wide range of tasks, including cell annotation, GRN construction, genetic perturbation prediction, and drug response prediction. By combining cutting-edge deep learning techniques with biological knowledge, RegFormer not only improves accuracy and interpretability but also provides deeper insights into cellular processes and regulatory mechanisms, positioning it as a powerful tool for advancing biological discovery.