Generative Models Validation via Manifold Recapitulation Analysis
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (Arcadia Science)
Abstract
Summary
Single-cell transcriptomics increasingly relies on nonlinear models to harness the dimensionality and growing volume of data. However, most model validation focuses on local manifold fidelity (e.g., Mean Squared Error and other data likelihood metrics), with little attention to the global manifold topology these models should ideally be learning. To address this limitation, we have implemented a robust scoring pipeline aimed at validating a model’s ability to reproduce the entire reference manifold. The Python library Cytobench demonstrates this approach, along with Jupyter Notebooks and an example dataset to help users get started with the workflow. Manifold recapitulation analysis can be used to develop and assess models intended to learn the full network of cellular dynamics, as well as to validate their performance on external datasets.
Availability
A Python library implementing the scoring pipeline has been made available via pip and can be inspected at GitHub alongside some Jupyter Notebooks demonstrating its application.
Contact
nlazzaro@fbk.eu or toma.tebaldi@unitn.it
Supplementary information
Supplementary data are available at Bioinformatics online.
Article activity feed
-
PED(X, Y ) = 12 E[infπ(∥d(X, X′) − d(Y, X′)π ∥p)]+ 12 E[infπ(∥d(Y, Y ′) − d(X, Y ′)π ∥
Sorry if this is clear, but I'm a little unclear on the notation. Is X the input data (so empirical results from a scRNA-seq experiment) and Y the generated dist? If so then are X' and Y' subsets of the respective distributions?
-
-