The power of machine learning models in predicting gestational diabetes mellitus
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Gestational diabetes mellitus (GDM) is associated with various adverse outcomes in pregnancy for the mother and the baby. Precise prediction of GDM trends are crucial for public health. Machine learning (ML) models have emerged as powerful tools for disease prediction, demonstrating superior results compared to traditional methods. As a result, we set out to assess the effectiveness of ML models for predicting GDM. Methods This was a retrospective study that used the clinical and demographic information of all mothers admitted for delivery to a primary referral tertiary center located in Bandar Abbas, Iran, from January 2020 to January 2022. The input data were employed in seven ML models. To evaluate the diagnostic potential of each model, we utilized the accuracy, area under the curve (AUC), precision, recall, and F1 score. Results The incidence rate of GDM was 20.9%. Models utilizing Boosting techniques, particularly CatBoost and XGBoost, excelled in predicting GDM compared to other models, however the prediction power was not high. These two models yielded the best average AUC (approximately 0.64 to 0.66) and also the top average Recall (approximately 0.43 to 0.52). In terms of all performance metrics, CatBoost demonstrated the best capability in predicting GDM with acceptable accuracy and average recall. BMI, maternal age, gestational age, maternal education, maternal residency, fetal gender, history of abortion, and parity were among the top 10 features that highly predicted GDM. Conversely, factors like intrauterine fetal demise, infertility, cardiovascular issues, COVID-19, substance abuse, multiple births, anemia, and a history of stillbirth exhibited nearly zero significance across nearly all models due to their low prevalence or lack of diversity. Conclusion CatBoost model had a higher performance in predicting GDM, however the prediction power was not so high. Additional studies are needed for improved conclusions.