Systematic benchmarking of foundation models and classical baselines for microbiome-based disease prediction

Jin Mu
Zheng-Zheng Tang
Guanhua Chen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Microbiome-based disease prediction is often hindered by sparse, compositional features and substantial inter-study heterogeneity. Foundation models and LLM-derived representations could, in principle, improve robustness and cross-cohort generalization, but their utility for microbiome prediction has not been systematically benchmarked. Results: We benchmarked classical machine-learning baselines (regularized logistic regression and random forests), standard numerical feature representations, GPT-derived semantic embeddings, and two foundation-model paradigms: a general-purpose tabular foundation model (TabPFN) and a microbiome-specific foundation model (MGM). Using 83 publicly curated case–control cohorts spanning 20 diseases profiled by 16S rRNA sequencing and shotgun metagenomics, we assessed performance under three settings: intra-cohort cross-validation, cross-cohort transfer (train on one cohort, test on others), and leave-one-study-out (LOSO) validation. GPT-derived semantic embeddings consistently underperformed standard numerical representations. TabPFN achieved strong out-of-the-box performance and competitive cross-cohort robustness, but did not consistently outperform well-tuned classical baselines across cohorts. MGM’s performance was disease-dependent and generally lagged behind the strongest tabular baselines, suggesting that current microbiome-specific pretraining at genus resolution does not yet confer a consistent advantage under study heterogeneity. Batch-effect correction methods provided limited and non-uniform improvements in LOSO evaluations. Conclusions: In this large-scale benchmark, current foundation-model approaches offer, at best, modest gains over strong classical baselines for microbiome-based disease prediction. Our results highlight that standard numerical representations remain difficult to beat, general-purpose tabular foundation models can provide strong out-of-the-box performance under domain shift, and microbiome-specific foundation models may require advances in pretraining scale, taxonomic resolution, and architecture to translate pretraining into reliable cross-study generalization.

Version published to 10.21203/rs.3.rs-8912605/v1 on Research Square
Feb 25, 2026

A Block-Scaled Early Fusion Framework for Multi-Omics Integration Reveals Microbiome-Immune Bridge Features in Inflammatory Bowel Disease

This article has 15 authors:
1. Cole Myers
2. Aaron Clarke
3. Max Hill
4. Myana Anderson
5. Adam Herman
6. Amanda Hayward
7. Parthasarathy Rangarajan
8. Byron Vaughn
9. Brian T. Steffen
10. Elizabeth R. Lusczek
11. Christopher Staley
12. Cyrus Jahansouz
13. Sabarinathan Ramachandran
14. Todd W Costantini
15. Geetha Saarunya
This article has no evaluationsLatest version Mar 18, 2026
Methods for Continuous-Valued Training Data Generation from Genome-Scale Metabolic Models: Partial-Inhibition FBA with Mixed Essentiality Sampling, Applied to ESKAPE Drug Target Curation

This article has 1 author:
1. Byeongsoo Kang
This article has no evaluationsLatest version Apr 13, 2026
Inflammation-Linked Aging Signals in Frozen Single-Cell Foundation Models: Donor-Aware Detection and Robustness Testing

This article has 1 author:
1. Ihor Kendiukhov
This article has no evaluationsLatest version Apr 13, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Block-Scaled Early Fusion Framework for Multi-Omics Integration Reveals Microbiome-Immune Bridge Features in Inflammatory Bowel Disease

Methods for Continuous-Valued Training Data Generation from Genome-Scale Metabolic Models: Partial-Inhibition FBA with Mixed Essentiality Sampling, Applied to ESKAPE Drug Target Curation

Inflammation-Linked Aging Signals in Frozen Single-Cell Foundation Models: Donor-Aware Detection and Robustness Testing