mcRigor: a statistical method to enhance the rigor of metacell partitioning in single-cell data analysis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In single-cell data analysis, addressing sparsity often involves aggregating the profiles of homogeneous single cells into metacells. However, existing metacell partitioning methods lack checks on the homogeneity assumption and may aggregate heterogeneous single cells, potentially biasing downstream analysis and leading to spurious discoveries. To fill this gap, we introduce mcRigor, a statistical method to detect dubious metacells, which are composed of heterogeneous single cells, and optimize the hyperparameter of a metacell partitioning method. The core of mcRigor is a feature-correlation-based statistic that measures the heterogeneity of a metacell, with its null distribution derived from a double permutation scheme. As an optimizer for existing metacell partitioning methods, mcRigor has been shown to improve the reliability of discoveries in single-cell RNA-seq and multiome (RNA+ATAC) data analyses, such as uncovering differential gene co-expression modules, enhancer-gene associations, and gene temporal expression. Moreover, mcRigor enables benchmarking and selection of the most suitable metacell partitioning method with optimized hyperparameters tailored to specific datasets, ensuring reliable downstream analysis. Our results indicate that among existing metacell partitioning methods, MetaCell and SEACells consistently outperform MetaCell2 and SuperCell, albeit with the trade-off of longer runtimes.