scMusketeers: Addressing imbalanced cell type annotation and batch effect reduction with a modular autoencoder
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The growing number of single-cell gene expression atlases available offers a conceptual framework for improving our understanding of physio-pathological processes. To take full advantage of this revolution, data integration and cell annotation strategies need to be improved, in particular to better detect rare cell types and by better controlling batch effects in experiments. scMusketeers is a deep learning model that optimises the representation of latent data and solves both challenges. scMusketeers features three modules: (1) an autoencoder for noise and dimensionality reductions; (2) a focal loss classifier to enhance rare cell type predictions; and (3) an adversarial domain adaptation (DANN) module for batch effect correction. Benchmarking against state-of-the-art tools, including the UCE foundation model, showed that scMusketeers performs on par or better, particularly in identifying rare cell types. It also allows to transfer cell labels from single-cell RNA sequencing to spatial transcriptomics. With its modular and adaptable design, scMusketeers offers a versatile framework that can be generalized to other large-scale biological projects requiring deep learning approaches, establishing itself as a valuable tool for single-cell data integration and analysis.