SCARF: Single Cell ATAC-seq and RNA-seq Foundation model

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Recent advances in single-cell multi-omics have provided unprecedented insights into gene regulation by jointly profiling transcriptomic (scRNA-seq) and chromatin accessibility (scATAC-seq) landscapes. However, the inherent heterogeneity and high dimensionality of these multimodal data present significant challenges for effective integration and downstream analysis. Foundation models have demonstrated strong representation learning capabilities for scRNA-seq or scATAC-seq data. So far, however, no model has been specifically developed for the integrative analysis of these two modalities. Here, we introduce SCARF, a s ingle c ell A TAC-seq and R NA-seq f oundation model. SCARF is pre-trained on X-Omics, the largest curated collection of single-cell multi-omics data to date, comprising over 2.7 million cells across multiple tissues and species. The model utilizes a Mamba architecture for efficiently capturing long-context relationships between genes and between accessible regions. Modality-specific and shared features are learned by the model through self-supervised learning and contrastive learning, respectively. SCARF achieves state-of-the-art performance on multiple downstream tasks, including cell representation, cell matching, and cross-omics translation. Furthermore, SCARF enables few-shot cell type annotation, demonstrating strong generalizability across previously unseen datasets. These results highlight the power of foundation models for advancing integrative analysis of single cell multi-omics data, with broad applications in important tasks including cellular characterization, gene or genomic perturbation analysis, and regulation network analysis.

Article activity feed