Generative Models Validation via Manifold Recapitulation Analysis

Nicolo’ Lazzaro
Gianluca Leonardi
Raffaele Marchesi
Massimiliano Datres
Anna Saiani
Jacopo Tessadori
Alejandro Granados
Johan Henriksson
Marco Chierici
Giuseppe Jurman
Toma Tebaldi
Gabriele Sales

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (Arcadia Science)

Abstract

Summary

Single-cell transcriptomics increasingly relies on nonlinear models to harness the dimensionality and growing volume of data. However, most model validation focuses on local manifold fidelity (e.g., Mean Squared Error and other data likelihood metrics), with little attention to the global manifold topology these models should ideally be learning. To address this limitation, we have implemented a robust scoring pipeline aimed at validating a model’s ability to reproduce the entire reference manifold. The Python library Cytobench demonstrates this approach, along with Jupyter Notebooks and an example dataset to help users get started with the workflow. Manifold recapitulation analysis can be used to develop and assess models intended to learn the full network of cellular dynamics, as well as to validate their performance on external datasets.

Availability

A Python library implementing the scoring pipeline has been made available via pip and can be inspected at GitHub alongside some Jupyter Notebooks demonstrating its application.

Contact

nlazzaro@fbk.eu or toma.tebaldi@unitn.it

Version published to 10.1101/2024.10.23.619602v3 on bioRxiv
Nov 18, 2024
Arcadia Science
Oct 31, 2024

PED(X, Y ) = 12 E[infπ(∥d(X, X′) − d(Y, X′)π ∥p)]+ 12 E[infπ(∥d(Y, Y ′) − d(X, Y ′)π ∥

Sorry if this is clear, but I'm a little unclear on the notation. Is X the input data (so empirical results from a scRNA-seq experiment) and Y the generated dist? If so then are X' and Y' subsets of the respective distributions?

Read the original source
Version published to 10.1101/2024.10.23.619602v2 on bioRxiv
Oct 29, 2024
Version published to 10.1101/2024.10.23.619602v1 on bioRxiv
Oct 26, 2024

Toward Reliable Synthetic Omics: Statistical Distances for Generative Models Evaluation

This article has 7 authors:
1. Raffaele Marchesi
2. Nicolò Lazzaro
3. Gianluca Leonardi
4. Federica Rignanese
5. Stefano Bovo
6. Marco Chierici
7. Giuseppe Jurman
This article has no evaluationsLatest version May 13, 2025
Robust self-supervised machine learning for single cell embeddings and annotations

This article has 4 authors:
1. Christine Yiwen Yeh
2. Min Woo Sun
3. Dixian Zhu
4. Livnat Jerby
This article has no evaluationsLatest version Jun 8, 2025
PLIERv2: bigger, better and faster

This article has 5 authors:
1. Marc Subirana-Granés
2. Sutanu Nandi
3. Haoyu Zhang
4. Maria Chikina
5. Milton Pividori
This article has no evaluationsLatest version Jun 9, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Summary

Availability

Contact

Article activity feed

Related articles

Toward Reliable Synthetic Omics: Statistical Distances for Generative Models Evaluation

Robust self-supervised machine learning for single cell embeddings and annotations

PLIERv2: bigger, better and faster