Praxis-BGM: Clustering of Omics Data Using Semi-Supervised Transfer Learning for Gaussian Mixture Models via Natural-Gradient Variational Inference
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
High-dimensional omics data are typically measured on limited sample sizes, which challenges model-based clustering methods such as Gaussian mixture models, often leading to instability and poor generalization under complex mixture structures. To address these limitations, we developed Praxis-BGM, a natural-gradient variational inference framework for Gaussian mixture models that enables semi-supervised transfer learning by incorporating an informative prior Gaussian mixture model derived from large-scale reference data with robust cluster structures. This prior can encode cluster-specific means, covariance structures, and structural connectivity patterns, and is updated using the target data with variational inference to improve clustering in small-sample settings.
Results
We derived natural-gradient updates for standard parameters and assess feature-level contributions to posterior clustering via Bayes Factors. Implemented in Python library JAX for accelerator-oriented computation, Praxis-BGM is computationally efficient and scalable. Across extensive simulations and two real-world applications—breast cancer bulk transcriptomics for subtype recovery and single-cell transcriptomics for cross-platform label transfer—Praxis-BGM improves posterior clustering performance, stability, and biological interpretability, even when priors are partially mismatched.
Availability and Implementation
Praxis-BGM is freely available at https://github.com/ContiLab-usc/Praxis-BGM , and an archival version is available on Zenodo at https://doi.org/10.5281/zenodo.19657680 .
Contact
qiranjia@usc.edu
Supplementary Information
Supplementary materials are available with the manuscript submission.