CAdir: Fast Clustering and Visualization of Single-Cell Transcriptomics Data by Direction in CA Space

Clemens Kohl
Martin Vingron

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Clustering for single-cell RNA-seq aims at finding similar cells and grouping them into biologically meaningful clusters. Many available clustering algorithms however do not not provide the cluster defining marker genes or are unable to infer the number of clusters in an unsupervised manner as well as lack tools to easily determine the quality of the label assignments. Therefore, clustering quality is commonly evaluated by visually inspecting low-dimensional embeddings as produced by e.g. UMAP or t-SNE. These embeddings can, however, distort the true cluster structure and are known to produce radically different embeddings depending on the chosen hyperparameters. Determining clustering quality therefore still heavily relies on domain knowledge to assess if cells should be clustered together. In order to improve the interpretability of clustering results, we developed CAdir ( https://github.com/VingronLab/CAdir ), a clustering algorithm that can infer the number of clusters in the data, determine cluster specific genes and provides easy to interpret diagnostic plots. CAdir exploits the geometry induced by correspondence analysis (CA) to cluster cells as well as cluster associated genes based on their direction in CA space. Using the angle between the cluster directions, it is able to automatically infer the number of clusters in the data by merging and splitting clusters. A comprehensive set of diagnostic and explanatory plots provides users with valuable feedback about the clustering decisions and the quality of the final as well as intermediary clusters. CAdir is scalable to even the largest data set and provides similar clustering performance to other state-of-the-art cell clustering algorithms in our benchmarking.

Version published to 10.1101/2025.03.14.643234v1 on bioRxiv
Mar 17, 2025

CellScope: High-Performance Cell Atlas Workflow with Tree-Structured Representation

This article has 7 authors:
1. Bingjie Li
2. Runyu Lin
3. Tianhao Ni
4. Guanao Yan
5. Mannix Burns
6. Jingyi Jessica Li
7. Zhigang Yao
This article has no evaluationsLatest version Feb 20, 2025
STEAM: Spatial Transcriptomics Evaluation Algorithm and Metric for clustering performance

This article has 4 authors:
1. Samantha Reynoso
2. Courtney Schiebout
3. Revanth Krishna
4. Fan Zhang
This article has no evaluationsLatest version Feb 19, 2025
Spatially Aware Adjusted Rand Index for Evaluating Spatial Transcriptomics Clustering

This article has 3 authors:
1. Yinqiao Yan
2. Xiangnan Feng
3. Xiangyu Luo
This article has no evaluationsLatest version Mar 28, 2025

Listed in

Abstract

Article activity feed

Related articles

CellScope: High-Performance Cell Atlas Workflow with Tree-Structured Representation

STEAM: Spatial Transcriptomics Evaluation Algorithm and Metric for clustering performance

Spatially Aware Adjusted Rand Index for Evaluating Spatial Transcriptomics Clustering