Praxis-BGM: Clustering of Omics Data Using Semi-Supervised Transfer Learning for Gaussian Mixture Models via Natural-Gradient Variational Inference

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

High-dimensional omics data are typically measured on limited sample sizes, which challenges model-based clustering methods such as Gaussian mixture models, often leading to instability and poor generalization under complex mixture structures. To address these limitations, we developed Praxis-BGM, a natural-gradient variational inference framework for Gaussian mixture models that enables semi-supervised transfer learning by incorporating an informative prior Gaussian mixture model derived from large-scale reference data with robust cluster structures. This prior can encode cluster-specific means, covariance structures, and structural connectivity patterns, and is updated using the target data with variational inference to improve clustering in small-sample settings.

Results

We derived natural-gradient updates for standard parameters and assess feature-level contributions to posterior clustering via Bayes Factors. Implemented in Python library JAX for accelerator-oriented computation, Praxis-BGM is computationally efficient and scalable. Across extensive simulations and two real-world applications—breast cancer bulk transcriptomics for subtype recovery and single-cell transcriptomics for cross-platform label transfer—Praxis-BGM improves posterior clustering performance, stability, and biological interpretability, even when priors are partially mismatched.

Availability and Implementation

Praxis-BGM is freely available at https://github.com/ContiLab-usc/Praxis-BGM , and an archival version is available on Zenodo at https://doi.org/10.5281/zenodo.19657680 .

Contact

qiranjia@usc.edu

Supplementary Information

Supplementary materials are available with the manuscript submission.

Article activity feed