ACMTF-R: supervised multi-omics data integration uncovering shared and distinct outcome-associated variation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rapid growth of high-dimensional biological data has necessitated advanced data fusion techniques to integrate and interpret complex multi-omics and longitudinal datasets. Advanced Coupled Matrix and Tensor Factorization (ACMTF) has emerged as a powerful framework for uncovering common, local, and distinct sources of variation across datasets. However, ACMTF lacks the ability to model variation linked to a dependent variable, limiting its applicability to studies investigating biological phenotypes. N-way Partial Least Squares (NPLS) is a supervised method that identifies variation in relation to a dependent variable but lacks the ability to identify common, local and distinct sources of variation across multiple datasets. To bridge the gap between data exploration and prediction, we introduce ACMTF-Regression (ACMTF-R), an extension of ACMTF that incorporates a regression term, allowing for the simultaneous decomposition of multi-way data while explicitly capturing variation associated with an outcome variable.
We present a detailed mathematical formulation of ACMTF-R, including its optimisation algorithm and implementation. Through extensive simulations, we systematically evaluate its ability to recover a small y - related component shared between multiple blocks, its robustness to noise, and the impact of the tuning parameter ( π ) which controls the balance between data exploration and outcome prediction. Our results demonstrate that ACMTF-R can robustly identify the y -related component, correctly identifying outcome-associated shared and distinct variation, distinguishing it from existing approaches such as N-way Partial Least Squares and ACMTF.
To validate its applicability in a real-world setting, we apply ACMTF-R to a multi-omics dataset integrating human milk microbiome, human milk metabolome, and infant faecal microbiome data, investigating how maternal pre-pregnancy BMI affects microbial and metabolic signatures. ACMTF-R successfully identifies novel mother-infant relationships associated with maternal pre-pregnancy BMI, underscoring its utility in multi-omics research. Our findings establish ACMTF-R as a versatile tool for multi-way data fusion, offering new insights into complex biological systems by integrating common, local, and distinct variation in the context of a dependent variable.