MOFA-FLEX: A Factor Model Framework for Integrating Omics Data with Prior Knowledge

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Latent factor models are first-line analysis approaches for single- and multi-omics data, essential for data integration, alignment, and biological signal discovery. To cater for new technologies and experimental designs, bespoke extensions of factor models have been proposed, incorporating spatial structure, temporal dynamics and the noise characteristics of single-cell assays. However, the development of tailored methods and software for individual use cases is laborious and requires advanced statistical and domain expertise, posing a significant barrier to users.

To address this, we here propose MOFA-FLEX, a flexible and modular factor analysis framework designed for customisable modelling across diverse multi-omics data scenarios. Built on probabilistic programming, MOFA-FLEX unifies previously isolated extensions of factor analysis – including flexible priors, non-negativity constraints, supervision signals, and alternative data likelihoods – allowing models to be configured declaratively without requiring manual engineering. Additionally, MOFA-FLEX features a novel domain knowledge module to inform and connect latent factors to gene programs.

We demonstrate MOFA-FLEX across multiple applications, showing (i) improved robustness in recovering gene programs from noisy prior knowledge in scRNA-seq data; (ii) effective disentanglement of technical and biological variation in multi-omic CITE-seq; and (iii) tailored spatial modelling that reveals spatially organised disease-associated gene programs in breast cancer.

Article activity feed