Estimating and correcting index hopping misassignments in single-cell RNA-seq data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Index hopping causes read assignment errors in data from multiplexed sequencing libraries. This issue has become more prevalent with the widespread use of high-capacity sequencers and highly multiplexed single-cell RNA sequencing (scRNA-seq) libraries. We conducted deep, plate-based scRNA-seq on a mixed population of mouse skin cells. Analysis of transcriptomes from 1152 cells identified four distinct cell types. To estimate the error rate in sample assignment due to index hopping, we employed differential expression analysis to identify signature genes that were highly and specifically expressed in each cell type. We quantified the proportion of misassigned reads by examining the detection rates of signature genes in other cell types. Remarkably, regardless of gene expression levels, we estimated that 0.65% of reads per gene were assigned to incorrect cell across our data. To computationally compensate for index hopping, we developed a simple correction method wherein, for each gene, 0.65% of the library’s average expression level was subtracted from the expression of each cell. This correction had notable effects on transcriptome analyses, including increased cell-cell clustering distance and alterations in intermediate state assignments of cell differentiation. These findings underscore the potential impact of index hopping on experimental results. In conclusion, we devised a straightforward method to estimate and correct for the index hopping rate by quantifying misassigned genes in distinct cell types within an scRNA-seq library. This approach can be applied to any barcoded, multiplexed scRNA-seq library containing cells with distinct expression profiles, allowing for correction of the expression matrix before conducting biological analysis.
AUTHOR SUMMARY
Index hopping causes errors in multiplexed high-throughput sequencing data whereby inaccurate barcoding leads to the misassignment of sequence reads to an incorrect source. This can result in gene expression assignments to the wrong cell in single-cell RNA sequencing (scRNA-seq). We performed scRNA-seq on sorted mouse skin cells and identified four distinct cell types based on gene expression profiles. Using genes unique to each cell type, we discovered that approximately 0.65% of total reads per gene were incorrectly assigned to another cell due to index hopping, regardless of gene expression levels. To correct for the misassigned reads, we proportionally adjusted each cell’s gene expression data. Applying this correction improved analyses allowing for enhanced cell clustering accuracy and refining the identification of intermediate cell states during cell differentiation. Our study emphasizes the significance of index hopping in scRNA-seq experiments and presents a practical correction approach to remove misassigned reads that can be applied to any scRNA-seq study involving cells with distinct gene expression profiles or where non-expressed genes are introduced as a quality control measure during library preparation.