StabCell: Stability selection for clustering and marker detection in single-cell RNA sequencing
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Conventional pipelines for differential expression analysis in single-cell RNA sequencing (scRNA-seq) data first cluster individual cells and then test for differentially expressed genes between the resulting clusters. Using the same data for clustering and testing, however, poses a selective inference problem and can result in overconfidence in differences that may not reflect true biological variation.
Results
We introduce StabCell , a stability selection framework which integrates clustering and detection of differentially expressed marker genes. By repeatedly performing clustering and differential expression analysis on complementary random subsamples, StabCell assesses clustering and marker stability, yielding a stable clustering with sets of stable marker genes. In simulations, we demonstrate that StabCell provides approximate empirical per-family error rate (PFER) control, selecting fewer false positive marker genes compared with conventional approaches, especially in cases with low signal-to-noise ratio and low sequencing depth. Applying the method to a cell differentiation dataset from induced pluripotent stem cells (IPSCs) to cardiomyocytes reveals that meaningful marker genes are consistently among the top-ranked genes. These results indicate that StabCell can improve the interpretability and robustness of scRNA-seq analyses.
Availability and implementation
An implementation of StabCell in the statistical programming language R is available at https://github.com/LuckyLueck/StabCell . Code to reproduce the results is available at https://github.com/LuckyLueck/StabCell_paper .