Risk prediction models for discordant growth in dichorionic twins was established based on the logistic regression algorithm and machine learning techniques
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective To develop a risk prediction model for discordant growth in dichorionic twins, enabling early identification and screening of high-risk cases. Methods Clinical data from 1,098 dichorionic twin pregnancies delivered at Anhui Maternal and Child Health Hospital between January 2016 and January 2024 were retrospectively analyzed. Based on the presence of discordant growth, the cohort was divided into two groups: 231 cases with discordant growth and 867 without. The dataset was randomly split into a training set (70%) and a validation set (30%). Predictive models were developed using the training set, and performance was evaluated using the validation set. Candidate predictors were selected through univariate and multivariate logistic regression analyses. A risk prediction model was built using LR and five machine learning (ML) algorithms: logistic regression, random forest (RF), Gaussian Naive Bayes (GNB), k-nearest neighbors (k-NN), and extreme gradient boosting (XGBoost), to assess the likelihood of discordant twin growth. Results Univariate LR identified birth time, pre-pregnancy benign hypertension, pre-pregnancy autoimmune disease, umbilical cord abnormalities, and placental abnormalities as significant risk factors for discordant growth ( P < 0.05). Multivariate analysis confirmed the independence of these variables, with placental abnormalities showing the highest adjusted odds ratio (OR), followed by umbilical cord abnormalities and pre-pregnancy benign hypertension. The significance of pre-pregnancy autoimmune disease was reduced in the multivariate model. The logistic regression model achieved an area under the curve (AUC) of 0.710 in the training set and 0.711 in the validation set. Sensitivity and specificity were 0.665 and 0.679, respectively, in the training set, and 0.667 and 0.638 in the validation set. Positive predictive values (PPVs) were high in both sets (training: 0.886; validation: 0.874), while negative predictive values (NPVs) were lower (training: 0.351; validation: 0.336). The Hosmer-Lemeshow goodness-of-fit test indicated a satisfactory model fit ( P = 0.456 for training; P = 0.338 for validation). Calibration curves showed that for threshold probabilities between 10% and 50%, the model provided substantial net clinical benefit in both sets. Among the ML models (MLMs), k-NN achieved the highest AUC (0.687) and specificity (0.881), indicating strong discrimination and a low false-positive rate. GNB showed the highest sensitivity (0.710), effectively identifying true positives. LR and RF demonstrated balanced but moderate performance. In clinical decision curve analysis, at a threshold probability of 0.5, RF and GNB remained profitable (net benefit = 0.4), while XGBoost resulted in net loss (-0.1), indicating overconfidence. Overall, the k-NN model demonstrated the best predictive performance. Conclusion Prediction models developed using birth time, pre-pregnancy benign hypertension, pre-pregnancy autoimmune disease, umbilical cord abnormalities, and placental abnormalities showed good predictive value for discordant growth in dichorionic twins. These models can assist clinicians in risk assessment, clinical consultation, and targeted screening of high-risk groups, enabling more precise follow-up and intervention.