On the benchmarking of clustering algorithms and hyperparameter influence for cell type detection in single-cell RNA sequencing data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Clustering single-cell RNA-seq (scRNA-seq) data remains a major challenge due to high dimensionality and noise. Despite numerous bench-marking studies aiming to identify the best clustering methods, many suffer from methodological flaws that undermine their conclusions. A major challenge in benchmarking is selecting representative datasets that cover the diversity of scRNA-seq experiments and include laboratory-verified labels for reliable evaluation. Consistent preprocessing of all inputs to benchmarked algorithms is crucial, as it significantly impacts performance. Beyond selecting an algorithm, thorough exploration of hyperparameters is essential to assess robustness and identify configurations that maximize performance. After identifying common methodological issues in clustering benchmarking studies, we propose a new benchmarking framework designed to address these limitations. Our framework mitigates potential biases from referential labels that may have been produced with the aid of clustering, enabling inclusion of more datasets beyond the pool of gold-standard datasets with purely lab-verified labels. We illustrate our methodology by comparing the classic Leiden and Louvain clustering algorithms, with extensive hyperparameter exploration resulting in 34,911,660 combinations across 22 datasets. The results did not reveal significant performance differences between the algorithms. We also show that overlooked factors, such as graph construction and quality functions, critically influence clustering outcomes. Jaccard-weighted graphs paired with best-performing quality functions maximize performance, while UMAP-weighted graphs can be more robust to fluctuations in the neighborhood size k for k -nearest neighbors graph construction. In general, our framework establishes a more reliable standard for benchmarking scRNA-seq clustering algorithms and sets a foundation for more accurate evaluation in future studies.