sCIN: A Contrastive Learning Framework for Single-Cell Multi-omics Data Integration

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rapid advancement of single-cell omics technologies such as scRNA-seq and scATAC-seq has transformed our understanding of cellular heterogeneity and regulatory mechanisms. However, integrating these data types remains challenging due to distributional discrepancies and distinct feature spaces. To address this, we present a novel single-cell Contrastive INtegration framework (sCIN), that integrates different omics modalities into a shared low-dimensional latent space. sCIN uses modality-specific encoders and contrastive learning to generate latent representations for each modality, aligning cells across modalities and removing technology-specific biases. The framework was designed to rigorously prevent data leakage between training and testing, and was extensively evaluated on three real-world paired datasets including SHARE-seq, 10X PBMC (10k version), and CITE-seq. Paired datasets refer to multi-omics data generated using technologies capable of capturing different omics features from the same cell population. Results on paired datasets show that sCIN outperforms alternative models, including Con-AAE, Harmony, and MOFA across multiple metrics: Average Silhouette Width (ASW) for clustering quality, Recall@k, Cell type accuracy, and Median Rank for integration quality. Moreover, sCIN was evaluated on simulated unpaired datasets derived from paired data, demonstrating its ability to leverage available biological information for effective multimodal integration. In summary, sCIN reliably integrates omics modalities while preserving biological meaning in both paired and unpaired settings.

Article activity feed