TopOMetry systematically learns and evaluates the latent dimensions of single-cell atlases

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This study presents TopOMetry, an important novel dimensionality reduction method that addresses a signficant challenge in the analysis of single-cell RNA sequencing data. The authors provide convincing evidence of the method's utility across various tasks, including estimating intrinsic dimensionalities and identifying cell types. The work would benefit from more rigorous validation and a reorganization of the text.

This article has been Reviewed by the following groups

Read the full article

Abstract

A core task in single-cell data analysis is recovering the latent dimensions encoding the genetic and epigenetic landscapes inhabited by cell types and lineages. However, consensus is lacking for optimal modeling and visualization approaches. Here, we propose these landscapes are ideally modeled as Riemannian manifolds, and present TopOMetry, a computational toolkit based on Laplacian-type operators to learn these manifolds. TopOMetry learns and evaluates dozens of possible representations systematically, eliminating the need to choose a single dimensional reduction method a priori . The learned visualizations preserve more original information than current PCA-based standards across single-cell and non-biological datasets. TopOMetry allows users to estimate intrinsic dimensionalities and visualize distortions with the Riemannian metric, among other challenging tasks. Illustrating its hypothesis generation power, TopOMetry suggests the existence of dozens of novel T cell subpopulations consistently found across public datasets that correspond to specific clonotypes. TopOMetry is available at https://github.com/davisidarta/topometry.

Article activity feed

  1. eLife Assessment

    This study presents TopOMetry, an important novel dimensionality reduction method that addresses a signficant challenge in the analysis of single-cell RNA sequencing data. The authors provide convincing evidence of the method's utility across various tasks, including estimating intrinsic dimensionalities and identifying cell types. The work would benefit from more rigorous validation and a reorganization of the text.

  2. Reviewer #1 (Public review):

    Summary:

    Sidarta-Oliveira et al. present TopOMetry, a novel dimensionality reduction method based on the eigendecomposition of approximated Laplace-Beltrami Operator. Shortly, TopOMetry is an iterative version of the existing spectral methods (e.g., Laplacian Eigenmap or Diffusion map). It approximates the Laplacian operators twice, once in a "phenotypic space" and then once again in the eigenbases space. By doing this the approximated operator will contain more information of the manifold, which allows for more robust and accurate downstream analyses.

    Strengths:

    (1) The approach was rigorously tested based on synthetic and real single-cell RNA-seq datasets.

    (2) The package is well-made and easily scalable to millions of cells.

    (3) The comprehensive documentation helps the end-users to run desired analyses.

    Weaknesses:

    (1) The method is an extension of the current state-of-art methods, not a fundamentally new one.

    (2) Considering the target readers, the paper contains a lot of jargon.

  3. Reviewer #2 (Public review):

    Summary:

    This work introduces a novel framework to systematically learn the latent dimensions of single-cell data, grounded in the theory of the Riemannian manifold. The authors demonstrate how this framework can be applied to various important tasks, such as estimating intrinsic dimensionalities, annotating cell types, etc. They did a great job of tackling an important but not yet established problem in the field and approaching it with a theoretically sound and novel approach. I think after a more rigorous and comprehensive validation, this work could be impactful.

    Strengths:

    (1) Dimensionality reduction is a routine step in analyzing many high-dimensional data, such as molecular data. While the downstream analysis results depend heavily on this step, existing methods rely on strong assumptions and are sometimes heuristic. The authors present a novel, theoretically grounded approach to address this important problem.

    (2) The authors demonstrated its usability in downstream analysis in a comprehensive manner. In particular, they show evidence suggesting novel T-cell subpopulations.

    (3) I commend the authors for releasing and maintaining their software well with comprehensive documentation. This significantly increases the usability and accessibility of the method.

    Weaknesses:

    (1) To encourage the single-cell community to adopt this method, the authors should more clearly demonstrate its advantages over existing methods. There are many single cell analysis algorithms that are proposed in each task and some of them are widely used by biologists. However, the comparison in this work is somewhat limited. For example, Even methods mentioned in the relevant work paragraph (2nd paragraph) on page 2 are not all compared, or the reason why they are not included is not discussed. Also, I am curious how PC dimensions are determined. The choice of 300 PCs on page 11 seems arbitrary. Furthermore, the usefulness of dimension-reduced data also depends a lot on the preceding processing steps, such as highly variable gene selection. I understand it is hard to control all those factors, but I think there is room for improvement.

    (2) The paper lacks experiments that validate the results. It would be beneficial to see additional evaluation settings with better-established ground truths to more strongly demonstrate the method's effectiveness.

    (3) The effect of various parameters, such as those involved in k-nearest neighbors (KNN) or choosing the appropriate Laplacian operator, is not comprehensively explored. How can we ensure the analysis is not overly sensitive to these parameters?

    (4) Batch effects are prevalent in single-cell data. The paper does not adequately address how the proposed method handles this issue.