Generative Models Validation via Manifold Recapitulation Analysis

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Summary

Single-cell transcriptomics increasingly relies on nonlinear models to harness the dimensionality and growing volume of data. However, most model validation focuses on local manifold fidelity (e.g., Mean Squared Error and other data likelihood metrics), with little attention to the global manifold topology these models should ideally be learning. To address this limitation, we have implemented a robust scoring pipeline aimed at validating a model’s ability to reproduce the entire reference manifold. The Python library Cytobench demonstrates this approach, along with Jupyter Notebooks and an example dataset to help users get started with the workflow. Manifold recapitulation analysis can be used to develop and assess models intended to learn the full network of cellular dynamics, as well as to validate their performance on external datasets.

Availability

A Python library implementing the scoring pipeline has been made available via pip and can be inspected at GitHub alongside some Jupyter Notebooks demonstrating its application.

Contact

nlazzaro@fbk.eu or toma.tebaldi@unitn.it

Supplementary information

Supplementary data are available at Bioinformatics online.

Article activity feed

  1. PED(X, Y ) = 12 E[infπ(∥d(X, X′) − d(Y, X′)π ∥p)]+ 12 E[infπ(∥d(Y, Y ′) − d(X, Y ′)π ∥

    Sorry if this is clear, but I'm a little unclear on the notation. Is X the input data (so empirical results from a scRNA-seq experiment) and Y the generated dist? If so then are X' and Y' subsets of the respective distributions?