Evaluation of Machine Learning Models for Early Prediction of Gestational Diabetes Using Retrospective Electronic Health Records from Current and Previous Pregnancies

Mark Germaine
Amy C O’Higgins
Brendan Egan
Graham Healy

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective

To assess the performance of machine learning (ML) models in predicting gestational diabetes mellitus (GDM) using electronic health record (EHR) data from the first antenatal visit, and determine whether incorporating previous pregnancies data improves performance.

Methods and Analysis

In this retrospective cohort study, several ML models were developed to predict GDM using EHR data (n=27,561, GDM 11.6%), from nulliparous and multiparous populations. Past pregnancy data (n=4,005) were incorporated to improve future (preconception) GDM predictions. Four ML algorithms were evaluated: logistic regression (LR), random forest (RF), XGBoost (XGB), and explainable boosting machine (EBM). Model performance was evaluated on an internal validation set assessing model discrimination (AUROC) and model calibration (plots, slope and intercept).

Results

The Feature Agnostic Model (all features) achieved AUROC 0.832 (slope 0.967; intercept −0.088) with LR, similar to more complex models such as XGB (AUROC 0.828; slope 0.976; intercept −0.072) and EBM (AUROC 0.829; slope 0.939; intercept −0.131). The Sequential Model that included first trimester and previous pregnancy data demonstrated the highest predictive performance, with XGB achieving an AUROC 0.904 (slope 0.618; intercept −0.136). Subset models using top clinical features maintained strong performance, particularly with the Sequential Model achieving an AUROC 0.897 (slope 1.137; intercept 0.161) using only eight features.

Conclusion

Incorporating previous pregnancy data improved ML performance for GDM prediction. Further, using a subset of clinically relevant features yielded similar performance, supporting potential integration into clinical decision support systems. These findings highlight the promise of early GDM risk identification in both nulliparous and multiparous populations; however, additional research, including external validation and clinical trials, is needed to determine the models’ practical utility and effect on maternal and neonatal outcomes.

KEY MESSAGES

What is already known on this topic?

Machine learning (ML) approaches have been used to identify gestational diabetes mellitus (GDM) risk in early pregnancy, but most studies focus only on data from the current pregnancy, and do not leverage prior obstetric history.

What this study adds

Incorporating data from previous pregnancies substantially improved ML models’ predictive performance, and reducing the models to a small set of key features still yielded good performance, assessed by model discrimination and calibration, suggesting feasibility for real-time clinical use at or even before the first antenatal visit.

How this study might affect research, practice or policy

These findings support the possibility of earlier GDM detection and intervention, and highlight the need for external validation and prospective trials to confirm broader utility, inform clinical workflows, and guide policy on integrating ML-driven risk prediction into routine maternity care.

Version published to 10.1101/2025.05.12.25327431v1 on medRxiv
May 13, 2025

Development and External Validation of a Non-invasive Early Gestational Diabetes Mellitus Prediction Model Integrating Social Network Variables: A Machine Learning-Based Prospective Cohort Study

This article has 10 authors:
1. Qianqian Li
2. Yalin Tang
3. Xiuling Yang
4. Tingqiang Song
5. Guozheng Wei
6. Ruting Gu
7. Yueshuai Pan
8. Jingyuan Wang
9. Yi Li
10. Lili Wei
This article has no evaluationsLatest version Jun 25, 2025
Machine Learning and Deep Learning Approaches for Predicting Diabetes Progression: A Comparative Analysis

This article has 3 authors:
1. Oluwafisayo Babatope Ayoade
2. Seyed Shahrestani
3. Chun Ruan
This article has no evaluationsLatest version Jun 26, 2025
Prediction of high-risk pregnancy based on machine learning algorithms

This article has 5 authors:
1. Xinyu Pi
2. Junzhi Wang
3. Liangliang Chu
4. Guochun Zhang
5. Wenli Zhang
This article has no evaluationsLatest version May 4, 2025

Listed in

Abstract

Objective

Methods and Analysis

Results

Conclusion

KEY MESSAGES

Article activity feed

Related articles

Development and External Validation of a Non-invasive Early Gestational Diabetes Mellitus Prediction Model Integrating Social Network Variables: A Machine Learning-Based Prospective Cohort Study

Machine Learning and Deep Learning Approaches for Predicting Diabetes Progression: A Comparative Analysis

Prediction of high-risk pregnancy based on machine learning algorithms