scTGCL: A Transformer-Based Graph Contrastive Learning Approach for Efficiently Clustering Single-Cell RNA-seq Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Single-cell RNA sequencing (scRNA-seq) enables characterization of cellular heterogeneity but clustering remains challenging due to high dimensionality, dropout induced sparsity, and technical noise. Existing graph-based and contrastive methods often rely on predefined similarity measures or suffer from high computational costs on large datasets. We propose single-cell Transformer-based Graph Contrastive Learning (scTGCL), a framework integrating multi-head self-attention with graph contrastive learning to learn robust cell representations. The model projects raw expression data into an embedding space and employs multi-head attention to adaptively learn weighted cell-cell graphs capturing diverse biological relationships. For contrastive augmentation, we apply random gene masking at the feature level and random edge dropping on attention matrices, simulating dropout and structural uncertainty. A symmetric contrastive loss maximizes agreement between original and augmented representations, while joint optimization with reconstruction and imputation losses preserves biological interpretability. Experiments on ten real scRNA-seq datasets demonstrate that scTGCL consistently outperforms nine state-of-the-art methods across clustering accuracy, normalized mutual information, and adjusted Rand index. Ablation studies validate each architectural component, and robustness analysis on simulated data confirms stable performance under varying dropout rates and differential expression levels. Furthermore, scTGCL exhibits superior computational efficiency, achieving substantially lower runtime on large scale datasets compared with existing approaches. The framework provides an accurate, efficient, and scalable solution for single-cell clustering. Source code and datasets are available at https://github.com/ShoaibAbdullahKhan/scTGCL .