A Benchmarking Study of Random Projections and Principal Components for Dimensionality Reduction Strategies in Single Cell Analysis

Mohamed Abdelnaby
Marmar R. Moussa

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Principal Component Analysis (PCA) has long been a cornerstone in dimensionality reduction for high-dimensional data, including single-cell RNA sequencing (scRNA-seq). However, PCA’s performance typically degrades with increasing data size, can be sensitive to outliers, and assumes linearity. Recently, Random Projection (RP) methods have emerged as promising alternatives, addressing some of these limitations. This study systematically and comprehensively evaluates PCA and RP approaches, including Singular Value Decomposition (SVD) and randomized SVD, alongside Sparse and Gaussian Random Projection algorithms, with a focus on computational efficiency and downstream analysis effectiveness. We benchmark performance using multiple scRNA-seq datasets including labeled and unlabeled publicly available datasets. We apply Hierarchical Clustering and Spherical K-Means clustering algorithms to assess downstream clustering quality. For labeled datasets, clustering accuracy is measured using the Hungarian algorithm and Mutual Information. For unlabeled datasets, the Dunn Index and Gap Statistic capture cluster separation. Across both dataset types, the Within-Cluster Sum of Squares (WCSS) metric is used to assess variability. Additionally, locality preservation is examined, with RP outperforming PCA in several of the evaluated metrics. Our results demonstrate that RP not only surpasses PCA in computational speed but also rivals and, in some cases, exceeds PCA in preserving data variability and clustering quality. By providing a thorough benchmarking of PCA and RP methods, this work offers valuable insights into selecting optimal dimensionality reduction techniques, balancing computational performance, scalability, and the quality of downstream analyses.

Version published to 10.1101/2025.02.04.636499v1 on bioRxiv
Feb 8, 2025

RamEx: An R package for high-throughput microbial ramanome analyses with accurate quality assessment

This article has 15 authors:
1. Yanmei Zhang
2. Gongchao Jing
3. Rongze Chen
4. Yanhai Gong
5. Yuandong Li
6. Yongshun Wang
7. Xixian Wang
8. Jia Zhang
9. Yuli Mao
10. Yuehui He
11. Xiaoshan Zheng
12. Mingchao Wang
13. Hao Yuan
14. Jian Xu
15. Luyang Sun
This article has no evaluationsLatest version Mar 13, 2025
Evaluating discrepancies in dimensionality reduction for time-series single-cell RNA-sequencing data

This article has 4 authors:
1. Maren Hackenberg
2. Laia Canal Guitart
3. Rolf Backofen
4. Harald Binder
This article has no evaluationsLatest version Feb 8, 2025
Randomized Spatial PCA (RASP): a computationally efficient method for dimensionality reduction of high-resolution spatial transcriptomics data

This article has 3 authors:
1. Ian K. Gingerich
2. Brittany A. Goods
3. H. Robert Frost
This article has no evaluationsLatest version Feb 20, 2025

Listed in

Abstract

Article activity feed

Related articles

RamEx: An R package for high-throughput microbial ramanome analyses with accurate quality assessment

Evaluating discrepancies in dimensionality reduction for time-series single-cell RNA-sequencing data

Randomized Spatial PCA (RASP): a computationally efficient method for dimensionality reduction of high-resolution spatial transcriptomics data