An Explainable Stacking Ensemble with Cost-Sensitive Threshold Optimization for Gestational Diabetes Risk Prediction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Gestational Diabetes Mellitus (GDM) is associated with increased risks of maternal and neonatal complications, and early identification of high-risk pregnancies is essential for improving clinical outcomes. This study develops an explainable stacking ensemble model for GDM prediction using routinely collected clinical and demographic variables. The ensemble integrates Logistic Regression, Random Forest, XGBoost, Support Vector Machine, and K-Nearest Neighbors as base learners, combined through a logistic meta-learner whose decision threshold is calibrated using a Maximum Expected Utility (MEU) framework to reflect the higher clinical cost of false-negative outcomes. The dataset consisted of 3,525 antenatal records and was processed using stratified splitting, fold-level K-Nearest Neighbors imputation, and class balancing applied only to the training subset to minimise the risk of information leakage. Model performance was evaluated on a hold-out test set using accuracy, precision, recall, F1-score, area under the ROC curve (AUC), and MEU-based utility metrics. The ensemble achieved an accuracy of 98.96%, precision of 99.30%, recall of 98.61%, F1-score of 98.95%, and an AUC of 0.9996, while the MEU-optimised threshold increased decision utility relative to default probability thresholds. SHAP-based explainability analysis showed that OGTT, BMI, PCOS, HDL cholesterol, and diastolic blood pressure were the most influential predictors. The results indicate that integrating ensemble learning, cost-sensitive threshold optimisation, and explainable AI can enhance predictive performance and interpretability for GDM risk stratification. However, as the findings are based on a single public dataset without external validation, the results should be interpreted cautiously. Future work will focus on temporal and cross-cohort validation, calibration analysis, and prospective evaluation in real clinical workflows to strengthen generalisability and translational value.

Article activity feed