Enhanced Prediction of Gut Microbiome–Related Diseases Using Hybrid Machine Learning Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The human gut, containing 100 trillion microbes, is also considered the “second brain,” having control over the different functions of the physiological system. With advancements in bioinformatics and the development of sequencing technologies, researchers are able to explore the diversity and functional implications of gut microbiota (GM), which have become strongly associated with a variety of diseases. Microbial imbalance, or dysbiosis, acts as a biomarker for early detection and prognosis of a disease. Artificial Intelligence and Machine Learning (AI/ML) methods, although extensively used in predicting GM associated diseases, are seldom translated to having practical real-world outcomes, necessitating the design of robust AI/ML models applicable in real-world scenario. We have therefore come up with designing stacking-based ensemble architectures (EM1 and EM2), developed by integrating multiple ML-based learning algorithms for improving disease prediction accuracy. The GM datasets, after split into training and test sets, were eventually fed into the proposed two-layer ensemble models, which combines the output from standardized base learners via a meta-classifier, strengthening classification robustness as well as ensuring consistency in optimized performance across diverse datasets. Both the proposed hybrid ensemble models have emerged to be superior performers over all baseline and deep learning models, with an average accuracy of 0.87 and 0.84 respectively. By combining multiple learners, the proposed ensemble models outperform traditional single-algorithm-based approaches to attain higher accuracy and robustness on complex GM datasets.

Key messages

  • Development of stacking-based hybrid ensemble models (EM), which can be employed to integrate different AI/ML algorithms with better prediction accuracy of gut microbiome (GM)-associated diseases.

  • Use of independent GM datasets with preprocessing methods such as SMOTE and PCA to address class imbalance and high dimensionality.

  • All the proposed EM architectures are mostly superior to the existing state-of-the-art AI/ML methods (highest prediction accuracy: 0.87 and 0.84 with EM1 and EM2 models respectively) for GM diseases predictions.

  • The cross-cohort validation demonstrates high prediction accuracy and robustness, (AUC values close to 0.98 and 0.99, for EM1 and EM2).

  • These therefore demonstrate the effectiveness of EM frameworks for GM associated disease prediction, paving the way for corresponding applications in precision medicine.

Article activity feed