Disagreement between demultiplexing methods reveals structured cell quality gradients in multiplexed single-cell data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Single-cell multi-omics profiling of hematopoietic malignancies frequently involves pooling of patient samples before library preparation to reduce costs. Demultiplexing and quality control of the resulting sequencing data depend on experimental design, sequencing depth, and computational methods. Existing approaches benchmark individual tools, auto-select a single best method, or apply majority voting. However, none systematically exploit disagreement patterns among orthogonal strategies as a diagnostic signal for cell quality.
Results
We introduce Split-flow, a modular Nextflow pipeline that runs hashing-based and SNP-based demultiplexing, and transcriptome-based doublet detection in parallel. It classifies cells into quality strata through a concordance-based decision framework. Validation on multiplexed CITE-seq data from 14 multiple myeloma patients across eight Chromium channels demonstrates high reproducibility and shows that discordant cells cluster within specific cell types and quality strata. TCR clonotype cross-referencing against VDJdb confirms that concordance-based classification enriches for biologically genuine immune receptor sequences, with a 5.3-fold enrichment of confirmed public TCR sequences in the high-confidence stratum. Downsampling analysis reveals that SNP-based methods are more depth-sensitive than hash-based approaches, supporting the recommendation to combine both strategies. The framework transfers to AML samples across three assay types (snMultiome-seq, scRNA-seq, scATAC-seq), where ATAC-based demultiplexing resolves donor assignment discordance under low hashing efficiency.
Conclusions
Split-flow demonstrates that combining of orthogonal preprocessing methods yields structured information about cell quality and offers a concordance-based framework that transforms this disagreement into a diagnostic signal. It introduces a preprocessing approach that can be exploited beyond hematopoietic malignancies in multiplexed single-cell applications.
Highlights and main findings
-
Introduces Split-flow, a modular Nextflow DSL2 pipeline for preprocessing of multiplexed single-cell multi-omics sequencing data from hematopoietic malignancy samples via a post hoc concordance-based decision framework.
-
Provides practical guidance for the experimental design of multiplexed single-cell multi-omics experiments, including the recommendation to combine antibody-based hashing with a SNP genotype reference for orthogonal demultiplexing.
-
Reveals that SNP-based demultiplexing is more sensitive to sequencing depth than hash-based approaches, and that the combined strategy mitigates depth-dependent biases in cell-type recovery.
-
Demonstrates that disagreement between demultiplexing methods contains structured diagnostic information about cell quality, with concordance categories reflecting genuine quality gradients in multiple myeloma CITE-seq samples.
-
Validates the concordance framework using T cell receptor sequences as an orthogonal biological readout, with a 5.3-fold enrichment of confirmed public TCR sequences in the high-confidence stratum.
-
Applies the preprocessing framework to AML patient samples across three assay types (snMultiome-seq, scRNA-seq, and scATAC-seq) and demonstrates that ATAC-based demultiplexing can resolve donor-assignment discordance.