Uncovering Developmental Lineages from Single-cell Data with Contrastive Poincaré Maps

Nithya Bhasker
Hattie Chung
Louis Boucherie
Vladislav Kim
Stefanie Speidel
Melanie Weber

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (Arcadia Science)

Abstract

Single-cell RNA-sequencing (scRNA-seq) enables the study of hierarchical and branching patterns in organismic development at high resolution. Analyzing such data requires visualization and analysis tools that faithfully represent the deep, tree-like structures formed by developmental lineages. Popular Euclidean embedding methods, such as UMAP and t-SNE, as well as domain-specific approaches like PHATE, distort hierarchical relationships in low dimensions, leading to a decrease in performance with growing tree depth. Hyperbolic geometry, which can represent trees with high accuracy in low dimensions, provides a natural remedy. However, existing hyperbolic methods, such as Poincaré Maps (PM), lose accuracy in deeper trees and require extensive feature engineering and memory. We present Contrastive Poincaré Maps (CPM), a self-supervised hyperbolic encoder that leverages contrastive learning in hyperbolic space to efficiently learn robust low-dimensional representations from scRNA-seq data. On synthetic trees with up to 5 generations and 34,000 individuals, CPM cuts distortion by > 99% and requires 13-fold less memory relative to PM. We further demonstrate CPM’s utility on three biological case studies. CPM uncovers accurate hierarchies across 9 developmental stages in the mouse gastrulation dataset comprising 116,312 cells, disentangles global multi-lineage hierarchies in the chicken cardiogenesis dataset while preserving intra-lineage developmental trends, and enables sampling-densityinvariant hierarchical analysis in the mouse hematopoiesis dataset. By leveraging hyperbolic geometry in combination with contrastive learning, CPM delivers a scalable framework that preserves hierarchical dependencies in developmental lineages, accelerates exploratory data analysis and opens new avenues for biological insights into developmental processes using scRNA-seq data.

A preliminary version of a part of this work was presented at the ICLR Workshop on Machine Learning for Genomics Explorations (Bhasker et al., 2024).

Arcadia Science
Sep 4, 2025

The first important observation is that state-of-the-art approaches,except CPM, fail to produce an embedding for the complete dataset (containing 100,000 cells),due to their reliance on pairwise distances for the computation of embeddings, which scalesquadratically in the number of cells

This doesn't feel quite fair, as UMAP and tSNE were designed to handle datasets of this size and have been widely used to generate embeddings for single-cell datasets of this size and larger. Also, I believe at least UMAP is sub-quadratic in the number of samples, as it uses an approximate kNN algorithm that is n log n.

Read the original source
Arcadia Science
Sep 4, 2025

Figure 3: Space and time complexity analysis.

Minor comment: using a log-log scale for these plots would be helpful, as it would prevent the reference methods (UMAP, tSNE, PHATE) from appearing as a flat line.

Read the original source
Arcadia Science
Sep 3, 2025

On synthetic trees with up to 5 generations and 34,000individuals, CPM cuts distortion by > 99%

It would be helpful to clarify what this claim is based on, as I can't see anything in Figure 2 that indicates a 99% change in any of the metrics between CPM and PM.

Read the original source
Arcadia Science
Aug 31, 2025

The dataset was normalized to 10000 counts per cell, Log1p transformed and filtered to contain2000 highly variable genes. The first important observation is that state-of-the-art approaches,except CPM

Does marker‑gene expression change monotonically along the CPM geodesic from root to leaf?

Read the original source
Version published to 10.1101/2025.08.22.671789 on bioRxiv
Aug 27, 2025

Reconstructing Waddington's Landscape from Data

This article has 4 authors:
1. Dillon J. Cislo
2. M Joaquina Delás
3. James Briscoe
4. Eric D. Siggia
This article has no evaluationsLatest version Aug 13, 2025
GeneSys: Generative Modeling of Developmental System

This article has 5 authors:
1. Che-Wei Hsu
2. Chia-Yu Chen
3. Trevor M. Nolan
4. Philip N. Benfey
5. Uwe Ohler
This article has no evaluationsLatest version Aug 25, 2025
Intrinsic dimensionality of single-cell transcriptomic data reveals potency landscapes during cell reprogramming

This article has 5 authors:
1. Maddalena Staiano
2. Niccolò Cirone
3. Marta Biondo
4. Matteo Osella
5. Antonio Scialdone
This article has no evaluationsLatest version Jul 24, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Related articles

Reconstructing Waddington's Landscape from Data

GeneSys: Generative Modeling of Developmental System

Intrinsic dimensionality of single-cell transcriptomic data reveals potency landscapes during cell reprogramming