Fast and interpretable quantification of biological shape heterogeneity via stratified Wasserstein kernel
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Modern imaging technologies produce vast collections of cellular and subcellular structures, calling for principled methods that enable shape comparison across individuals and populations. We introduce the stratified Wasserstein framework, which treats each shape as an unstructured point cloud and embeds it into Euclidean space via ranked local distance profiles. This embedding yields an isometry-invariant Euclidean distance and a positive-definite kernel for population analysis, with a consistent sample-based estimator that supports large datasets in near-quadratic time. By leveraging kernel methods, the framework enables statistically rigorous tasks such as nonparametric hypothesis testing, providing theoretical guarantees as well as interpretability. We demonstrate our framework’s applicability to large-scale biological datasets. Analyzing 2D cancer cell contours, we quantify population-level discrepancies and identify representative cells contributing most strongly to the observed differences. Using 3D volumes of cell envelope and nucleus, we reveal progression patterns that capture morphological changes across cell populations both at the level of individual shapes. These results establish a simple and principled tool for population-level biological shape analysis, with potential impact across diverse domains of computational imaging and data science.