Longitudinal Microbiome-based Interpretable Machine Learning for Identification of Time-Varying Biomarkers in Early Prediction of Disease Outcomes

Yifan Dai
Yunzhi Qian
Yixiang Qu
Wyliena Guan
Jialiu Xie
Duan Wang
Catherine Butler
Stuart Dashper
Ian Carroll
Kimon Divaris
Yufeng Liu
Di Wu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Information generated from longitudinally-sampled microbial data has the potential to illuminate important aspects of development and progression for many human conditions and diseases. Identifying microbial biomarkers and their time-varying effects can not only advance our understanding of pathogenetic mechanisms, but also facilitate early diagnosis and guide optimal timing of interventions. However, longitudinal predictive modeling of highly noisy and dynamic microbial data (e.g., metagenomics) poses analytical challenges. To overcome these challenges, we introduce a robust and interpretable machine-learning-based longitudinal microbiome analysis framework, LP-Micro, that encompasses: (i) longitudinal microbial feature screening via a polynomial group lasso, (ii) disease outcome prediction implemented via machine learning methods (e.g., XGBoost, deep neural networks), and (iii) interpretable association testing between time points, microbial features, and disease outcomes via permutation feature importance. We demonstrate in simulations that LP-Micro can not only identify incident disease-related microbiome taxa but also offers improved prediction accuracy compared to existing approaches. Applications of LP-Micro in two longitudinal microbiome studies with clinical outcomes of childhood dental disease and weight loss following bariatric surgery yield consistently high prediction accuracy. The identified critical early predictive time points are informative and aligned with clinical expectations.

Version published to 10.1101/2024.10.18.619118 on bioRxiv
Oct 22, 2024

PRESSnet: a novel framework for patient stratification and biomarker discovery using clinical knowledge graphs

This article has 11 authors:
1. Jake Cohen-Setton
2. Shruti Shikhare
3. Ioannis Kagiampakis
4. Domingo Salazar
5. Miguel Goncalves
6. Elizabeth Coker
7. Sanddhya Jayabalan
8. Damian Bikiel
9. Ben Sidders
10. Etai Jacob
11. Krishna Bulusu
This article has no evaluationsLatest version Dec 15, 2025
A pathomics model for predicting lactate-related subtypes and their biological mechanisms in lung adenocarcinoma

This article has 8 authors:
1. Lingling Zhou
2. Wen Liu
3. Ye Pan
4. Wan Lei
5. Lingfei Yan
6. Jingfang Zou
7. Kejian Qian
8. Fei Wang
This article has no evaluationsLatest version Jan 16, 2026
Machine Learning–Driven Discovery of Host Genetic Factors for Paratuberculosis in Goats Within the One Health Framework

This article has 11 authors:
1. Yalçın Yaman
2. Ahmet ESER
3. Devran Coşkun
4. Ramazan Aymaz
5. Yiğit Emir Kişi
6. Murat Keleş
7. Serdar Yağcı
8. Özgül Gülaydın
9. Serkan Süleyman Şengül
10. Kıvanç İrak
11. Memiş Bolacalı
This article has no evaluationsLatest version Jan 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

PRESSnet: a novel framework for patient stratification and biomarker discovery using clinical knowledge graphs

A pathomics model for predicting lactate-related subtypes and their biological mechanisms in lung adenocarcinoma

Machine Learning–Driven Discovery of Host Genetic Factors for Paratuberculosis in Goats Within the One Health Framework