A Fully Automated, Data-Driven Approach for Dimensionality Reduction and Clustering in Single-Cell RNA-seq Analysis

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Single-cell RNA sequencing (scRNA-seq) provides deep insights into cellular heterogeneity but demands robust dimensionality reduction (DR) and clustering to handle high-dimensional, noisy data. Many DR and clustering approaches rely on user-defined parameters, undermining reliability. Even automated clustering methods like ChooseR and MultiK still employ fixed principal component defaults, limiting their full automation. To overcome this limitation, we propose a fully automated clustering approach by integrating scLENS—a method for optimal PC selection—with these tools. Our fully automated approach improves clustering performance by ∼14% for ChooseR and ∼10% for MultiK and identifies additional cell subtypes, highlighting the advantages of adaptive, data-driven DR.

Highlights

  • Fully automated, data-driven clustering pipeline for scRNA-seq analysis.

  • Addresses the limitation of fixed principal component defaults in DR.

  • Data-driven pipeline improves clustering performance by 10-14%.

  • Performance gains are most pronounced on high-sparsity, high-skewness data.

Author Summary

A fundamental goal in modern biology is to create detailed cellular maps of complex tissues to understand their function in health and disease. Achieving this level of detail is now possible through single-cell sequencing, a technology that generates massive, high-dimensional, and noisy datasets of individual cells’ genetic profiles. However, accurately identifying cell-types from these datasets remains a major analytical hurdle, as current computational methods rely on subjective parameters that compromise reliability and reproducibility.

To address this challenge, we developed a fully automated pipeline that integrates scLENS, an adaptive noise filtering method, with automated cell grouping tools. By providing optimal parameters for noise filtering and cell grouping, our pipeline eliminates their reliance on subjective, fixed parameters, thereby significantly improving accuracy in cell-type classification. This tool, characterized by its high clustering efficiency, can accelerate discoveries in fields like cancer biology and immunology by providing a robust and reproducible analytical platform.

Article activity feed