A dual-metric framework for quantifying the biological fidelity of scRNA-seq pipelines

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Single-cell RNA sequencing (scRNA-seq) enables high-resolution transcriptional profiling, but early-stage processing pipelines differ markedly in barcode recovery, UMI correction, and read assignment-variations that can propagate and bias downstream analyses. We present a reproducible, parameter-aware benchmarking framework to quantify the biological fidelity of four primary pipelines-STARsolo, Cell Ranger, Kallisto|Bustools, and Alevin-fry-across simulated ground-truth datasets and complex biological contexts, including a Huntington’s disease (HD) mouse model. Our approach introduces two complementary metrics: the Cluster Annotation Score (CAS), assessing concordance between direct cell-level and cluster-level consensus labels, and the Marker Concordance Score (MCS), measuring cohesion of de novo marker genes per cell type. By systematically varying highly variable gene (n_HVG) and principal component (n_PC) settings, we map how upstream quantification interacts with downstream parameter choice. Simulations show STARsolo and Kallisto|Bustools deliver high technical accuracy, but only STARsolo consistently preserves stable cell identities and coherent marker expression across parameter regimes. In empirical datasets, alignment-based pipelines (STARsolo, Cell Ranger) yield higher CAS/MCS values and more biologically faithful annotations, while alignment-free methods show reduced signal fidelity despite faster runtimes. Differences introduced during primary processing persist after batch correction and integration, altering disease-associated cell type detection. Our open-source CAS-MCS-Scoring toolkit enables transparent evaluation of pipeline performance, providing a practical guide for selecting analysis strategies that maximize reproducibility and biological interpretability in scRNA-seq studies.

Article activity feed