Machine Learning for Dynamic and Short-term Prediction of Preeclampsia Using Routine Clinical and Laboratory Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Preeclampsia (PE) is a leading cause of maternal and perinatal morbidity and mortality, yet its unpredictable onset and rapid progression hinder timely management. Existing prediction tools often rely on specialized biomarkers, static assessments, or limited study cohorts, impeding clinical utility and generalizability. We conducted a retrospective, multi-site cohort study including 58,839 pregnancies delivered at three NewYork-Presbyterian hospitals. Using routine information captured within the electronic health record (EHR), including blood pressure with other maternal characteristics, and routine laboratory tests, we developed extreme gradient boosting (XGBoost) based models to predict PE onset within 1-, 2-, and 4-week horizons across different gestational ages. Performance was assessed using nested cross-validation at the training site and externally validated through direct transfer, fine-tuning, and retraining strategies. Prediction accuracy increased from 28 to 34 weeks of gestational age, peaked at 34 weeks (AUC 0.863 at training; 0.808–0.834 at validation), declined at 38 weeks, and rebounded near delivery (AUC up to 0.890). Blood pressure was the most consistent predictor, while laboratory features such as albumin, alkaline phosphatase, and hematologic indices added value earlier, and demographic and obstetric factors gaining importance later. Dynamic short-term prediction of PE in late gestation is feasible using routine data. This pragmatic, scalable approach provides opportunities for early intervention and is adaptable across diverse healthcare settings.

Article activity feed