Development and External Validation of a Non-invasive Early Gestational Diabetes Mellitus Prediction Model Integrating Social Network Variables: A Machine Learning-Based Prospective Cohort Study
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective There is a lack of research on predicting the occurrence of gestational diabetes mellitus (GDM) in early pregnancy from the perspective of social networks. This study aimed to establish a machine learning (ML) algorithm prediction model to predict early GDM by leveraging social networks and other non- related factors. Methods This prospective cohort study enrolled 2,433 pregnant individuals from four branches of Qingdao University Affiliated Hospital as the model development cohort and external validation cohort. First, we used SPSS to conduct univariate analysis for variable selection. Subsequently, statistically significant variables were selected and included in Logistic Regression (LR), Random Forest (RF), eXtreme Gradient Boosting (Xgboost), k-Nearest Neighbors (k-NN), Support Vector Classifier (SVC), Adaptive Boosting (AdaBoost) and Multilayer Perceptron (MLP). They were trained through hierarchical 10-fold cross-validation to maintain the class distribution. The performance of the model was evaluated using the area under the receiver operating characteristic curve (AUC-ROC), accuracy, recall, specificity and F1 score. External validation evaluated the model’s predictive performance and clinical effectiveness via ROC curves, Calibration curves, and decision curve analysis (DCA). Results 1,752 cases were included in the model development cohort and 681 cases were included in the geographically independent external validation cohort. Twenty-two risk factors for GDM were screened out through univariate logistic regression, covering sociodemographic characteristics, social network characteristics (such as the scale of the structural network and the semi-annual total contact frequency), and personal behavioral characteristics. The Xgboost model demonstrated the optimal comprehensive performance (AUC = 0.992), significantly outperforming other algorithms. External validation further confirmed that the model has excellent generalization ability (AUC = 0.940). The Calibration curve had more coincidence with the ideal curve. The DCA showed that the net benefit value of the application of the prediction model was higher, and the accuracy and stability of the model were better. Conclusions This study developed a high-performance GDM prediction model by integrating social network variables with conventional predictors. The Xgboost-based model achieved exceptional discrimination and specificity in external validation, demonstrating that social network metrics significantly enhance risk stratification beyond traditional clinical factors. It reveals the potential value of social network factors in predicting GDM, providing new ideas and methods for constructing GDM prediction models with higher predictive ability and more stable performance.