Machine Learning-Driven Early Prediction of Spontaneous Preterm Birth Subtypes from Second-Trimester Plasma Metabolomic
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Preterm birth is a major cause of neonatal morbidity and mortality, with spontaneous preterm birth (sPTB) comprising preterm premature rupture of membranes (pPROM) and spontaneous preterm labor (sPL). Reliable early prediction remains challenging, particularly in distinguishing sPTB subtypes. Metabolomics offers a promising approach for identifying predictive biomarkers. Methods A single-center case-control study was conducted using archived maternal plasma samples. Participants included 70 pregnant women (30 term controls, 20 pPROM, 20 sPL) at 14–20 weeks’ gestation. Non-targeted metabolomic profiling was performed via liquid chromatography-mass spectrometry (LC-MS). Metabolite screening was carried out using LASSO regression, and pathway enrichment analysis was conducted. Machine learning models (logistic regression) were developed and validated. Statistical analyses included PLS-DA, ROC curves, Pearson correlation, and risk stratification. Results Nine metabolites associated with inflammatory activation, oxidative stress, and placental dysfunction were identified. LASSO models achieved high predictive accuracy (AUCs: 0.984 for controls, 0.964 for pPROM, 0.995 for sPL). Creatinine and LysoPC(P-16:0) was positively correlated with gestational age at blood sampling (R = 0.27/0.23), while phosphatidylcholine was negatively correlated with maternal age (R=-0.31). Gestational age at delivery was negatively correlated with BMI (R = − 0.51). High-risk stratification showed a decreasing probability of preterm birth with increasing gestation, while low-risk stratification remained stable. Conclusions Second-trimester plasma metabolomics combined with machine learning could effectively predict sPTB and distinguish its subtypes. These findings support the potential for early risk stratification and personalized intervention, though multicenter validation is needed for clinical translation.