A Benchmark of Semi-Supervised scRNA-seq Integration Methods in Real-World Scenarios

Xiaoyu Shen
Chuan He
Leying Guan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Semi-supervised methods for single-cell RNA-seq integration promise to improve batch correction and biological signal preservation by leveraging cell-type labels. However, their reported benefits often rely on overly idealized settings. Here, we present the first systematic benchmark of five leading semi-supervised methods (scANVI, scGEN, ssSTACAS, scDREAMER, ItClust) against five widely used unsupervised baselines across six diverse datasets. We evaluate performance under five realistic annotation scenarios, including missing, erroneous, boundary-missing and mixed, batch-specific, and auto-generated labels, using nine established integration metrics. While semi-supervised methods show gains with perfect annotations, their robustness declines sharply under practical imperfections. Only scANVI and ssSTACAS maintain stable but modest improvements relative to their unsupervised counterparts, while none consistently outperform the strongest unsupervised method, scCRAFT. Our results highlight that current semi-supervised strategies offer limited practical advantage and that careful choice of integration method remains critical when label quality is uncertain.

Version published to 10.1101/2025.08.23.671952 on bioRxiv
Aug 27, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed