CAdir: Fast Clustering and Visualization of Single-Cell Transcriptomics Data by Direction in CA Space
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Clustering for single-cell RNA-seq aims at finding similar cells and grouping them into biologically meaningful clusters. Many available clustering algorithms however do not not provide the cluster defining marker genes or are unable to infer the number of clusters in an unsupervised manner as well as lack tools to easily determine the quality of the label assignments. Therefore, clustering quality is commonly evaluated by visually inspecting low-dimensional embeddings as produced by e.g. UMAP or t-SNE. These embeddings can, however, distort the true cluster structure and are known to produce radically different embeddings depending on the chosen hyperparameters. Determining clustering quality therefore still heavily relies on domain knowledge to assess if cells should be clustered together. In order to improve the interpretability of clustering results, we developed CAdir ( https://github.com/VingronLab/CAdir ), a clustering algorithm that can infer the number of clusters in the data, determine cluster specific genes and provides easy to interpret diagnostic plots. CAdir exploits the geometry induced by correspondence analysis (CA) to cluster cells as well as cluster associated genes based on their direction in CA space. Using the angle between the cluster directions, it is able to automatically infer the number of clusters in the data by merging and splitting clusters. A comprehensive set of diagnostic and explanatory plots provides users with valuable feedback about the clustering decisions and the quality of the final as well as intermediary clusters. CAdir is scalable to even the largest data set and provides similar clustering performance to other state-of-the-art cell clustering algorithms in our benchmarking.