A Fully Automated, Data-Driven Approach for Dimensionality Reduction and Clustering in Single-Cell RNA-seq Analysis

Hyun Kim
Faeyza Rishad Ardi
Kévin Spinicci
Jae Kyoung Kim

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Single-cell RNA sequencing (scRNA-seq) provides deep insights into cellular heterogeneity but demands robust dimensionality reduction (DR) and clustering to handle high-dimensional, noisy data. Many DR and clustering approaches rely on user-defined parameters, undermining reliability. Even automated clustering methods like ChooseR and MultiK still employ fixed principal component defaults, limiting their full automation. To overcome this limitation, we propose a fully automated clustering approach by integrating scLENS—a method for optimal PC selection—with these tools. Our fully automated approach improves clustering performance by ∼14% for ChooseR and ∼10% for MultiK and identifies additional cell subtypes, highlighting the advantages of adaptive, data-driven DR.

Highlights

Fully automated, data-driven clustering pipeline for scRNA-seq analysis.
Addresses the limitation of fixed principal component defaults in DR.
Data-driven pipeline improves clustering performance by 10-14%.
Performance gains are most pronounced on high-sparsity, high-skewness data.

Author Summary

A fundamental goal in modern biology is to create detailed cellular maps of complex tissues to understand their function in health and disease. Achieving this level of detail is now possible through single-cell sequencing, a technology that generates massive, high-dimensional, and noisy datasets of individual cells’ genetic profiles. However, accurately identifying cell-types from these datasets remains a major analytical hurdle, as current computational methods rely on subjective parameters that compromise reliability and reproducibility.

To address this challenge, we developed a fully automated pipeline that integrates scLENS, an adaptive noise filtering method, with automated cell grouping tools. By providing optimal parameters for noise filtering and cell grouping, our pipeline eliminates their reliance on subjective, fixed parameters, thereby significantly improving accuracy in cell-type classification. This tool, characterized by its high clustering efficiency, can accelerate discoveries in fields like cancer biology and immunology by providing a robust and reproducible analytical platform.

Version published to 10.1101/2025.10.06.680609 on bioRxiv
Oct 6, 2025

Discuss this preprint

Listed in

Abstract

Highlights

Author Summary

Article activity feed