An Explainable Maximum Expected Utility-Optimized Ensemble for Early Prediction of Gestational Diabetes Mellitus

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Gestational Diabetes Mellitus (GDM) is a growing global health concern associated with adverse maternal and neonatal outcomes. Early and accurate prediction of GDM can significantly enhance clinical decision-making and reduce complications. In this study, we present an explainable, Maximum Expected Utility (MEU)-optimized stacking ensemble model for early GDM prediction using structured clinical and demographic data. The ensemble integrates five heterogeneous base classifiers: Logistic Regression, Random Forest, XGBoost, Support Vector Machine, and K-Nearest Neighbors, combined through a meta-learner that is further calibrated using the MEU principle to maximize decision utility. The model was trained and evaluated on a balanced dataset of 3,525 antenatal records, preprocessed using standard scaling and manual oversampling. Optimal base model parameters were determined through RandomizedSearchCV. Experimental results demonstrated superior performance of the proposed ensemble, achieving an accuracy of 98.96%, precision of 99.30%, recall of 98.61%, F1-score of 98.95%, AUC of 0.9996, and a MEU score of 7.26. SHAP-based explainability revealed that oral glucose tolerance test (OGTT), polycystic ovary syndrome (PCOS), body mass index (BMI) and High-Density Lipoprotein (HDL) Cholesterol were the most influential features in GDM prediction, providing transparency and clinical interpretability. This study demonstrates that integrating ensemble learning with MEU optimization and explainable AI techniques can offer a robust and clinically actionable tool for early GDM detection. The model’s high sensitivity and decision-theoretic design make it particularly suitable for deployment in low-resource settings where early intervention is critical. Future work will focus on real-time deployment, external validation, and longitudinal modeling to enhance temporal prediction and generalizability across diverse populations.

Article activity feed