ACMTF-R: supervised multi-omics data integration uncovering shared and distinct outcome-associated variation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rapid growth of high-dimensional biological data has necessitated advanced data fusion techniques to integrate and interpret complex multi-omics and longitudinal datasets. Advanced Coupled Matrix and Tensor Factorization (ACMTF) has emerged as a powerful framework for uncovering common, local, and distinct sources of variation across datasets. However, ACMTF lacks the ability to model variation linked to a dependent variable, limiting its applicability to studies investigating biological phenotypes. N-way Partial Least Squares (NPLS) is a supervised method that identifies variation in relation to a dependent variable but lacks the ability to identify common, local and distinct sources of variation across multiple datasets. To bridge the gap between data exploration and prediction, we introduce ACMTF-Regression (ACMTF-R), an extension of ACMTF that incorporates a regression term, allowing for the simultaneous decomposition of multi-way data while explicitly capturing variation associated with an outcome variable.

We present a detailed mathematical formulation of ACMTF-R, including its optimisation algorithm and implementation. Through extensive simulations, we systematically evaluate its ability to recover a small y - related component shared between multiple blocks, its robustness to noise, and the impact of the tuning parameter ( π ) which controls the balance between data exploration and outcome prediction. Our results demonstrate that ACMTF-R can robustly identify the y -related component, correctly identifying outcome-associated shared and distinct variation, distinguishing it from existing approaches such as N-way Partial Least Squares and ACMTF.

To validate its applicability in a real-world setting, we apply ACMTF-R to a multi-omics dataset integrating human milk microbiome, human milk metabolome, and infant faecal microbiome data, investigating how maternal pre-pregnancy BMI affects microbial and metabolic signatures. ACMTF-R successfully identifies novel mother-infant relationships associated with maternal pre-pregnancy BMI, underscoring its utility in multi-omics research. Our findings establish ACMTF-R as a versatile tool for multi-way data fusion, offering new insights into complex biological systems by integrating common, local, and distinct variation in the context of a dependent variable.

Article activity feed