Accurate, scalable, and unified single-cell atlas integration with scBIOT
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Single-cell omics technologies have revolutionized the study of cellular diversity, yet integrating datasets across experiments remains challenging due to technical artifacts that obscure biology and confound clustering. We introduce scBIOT (Single-cell Biological Insights via Optimal Transport and Omics Transformers), a self-supervised framework that combines optimal transport alignment with Transformer-based variational autoencoders (VAEs) to learn a shared latent space across batches and modalities. This approach mitigates technical variation while preserving lineage boundaries and continuous trajectories, producing unsupervised clusters consistent with expert annotations. A semi-supervised variant, supBIOT, leverages partial labels to enhance cell-type resolution and cross-dataset consistency. Across multi-batch single-cell RNA sequencing benchmarks, scBIOT matches or outperforms leading integration methods without collapsing related subtypes, and the architecture generalizes to single-nucleus ATAC-seq and multimodal data with minimal adaptation. By integrating geometry-aware alignment with long-range feature modeling, scBIOT provides a scalable, modality-agnostic framework for high-resolution single-cell integration and analysis.