JIND-Multi: Leveraging Multiple Labeled Datasets for Automated Annotation of Single-Cell RNA and ATAC Data

Joseba Sancho
Akash Kanhirodan
Xabier Garrote
Olivier Gevaert
Mikel Hernaez
Guillermo Serrano
Idoia Ochoa

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

The creation of single-cell atlases is essential for understanding cellular diversity and heterogeneity. However, assembling these atlases is challenging due to batch effects and the need for accurate cell annotation. Current methods for single-cell RNA and ATAC sequencing, while effective for integration, are not optimized for cell annotation. Additionally, many annotation tools rely on external databases or reference scRNA-Seq datasets, which may limit their adaptability to specific study needs, especially for rare cell-types or scATAC-Seq data.

Results

We introduce JIND-Multi, an extended version of the JIND framework, designed to transfer cell-type labels across multiple annotated datasets. JIND-Multi significantly reduces the proportion of unclassified cells in single-cell RNA sequencing (scRNA-Seq) data while maintaining the accuracy and performance of the original JIND model. Furthermore, JIND-Multi demonstrates robust and precise annotation results in its inaugural application to scATAC-Seq data, proving its versatility and effectiveness across different single-cell sequencing technologies.

Conclusions

JIND-Multi represents an improvement in cell annotation, reducing unassigned cells and offering a reliable solution for both scRNA-Seq and scATAC-Seq data. Its ability to handle multiple labeled datasets enhances the precision of annotations, making it a valuable tool for the single-cell research community. JIND-Multi is publicly available at: https://github.com/ML4BM-Lab/JIND-Multi.git .

Version published to 10.1101/2025.01.15.633130 on bioRxiv
Jan 19, 2025