Large scale analysis and prediction of adverse maternal health events across heterogeneous populations from the All of Us dataset

Haojun Zhuang
Arthurine Zakama
Katherine Heller
Shakeela Faulkner
Bruce Gollub
Nichole Young-Lin
Irene Chen
Mercy Asiedu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective: To investigate and predict adverse maternal health events across heterogeneous populations using the NIH's "All of Us Research Program" dataset. Methods: In this work, we develop pipelines for data extraction, cleaning, and pre-processing of electronic health records (EHR) (Conditions, Labs, Measurements) and survey responses from a multi-site pregnancy cohort (n=22,646 participants; 33,294 pregnancy episodes). We assess statistical correlations between lab measures, outcomes, and social determinants of health (SDoH). We then develop machine learning models (Logistic Regression, XGBoost, LSTM) to predict preeclampsia/eclampsia, preterm labor, depression/anxiety, gestational diabetes, miscarriage, and cardiomyopathy two weeks before clinical diagnosis. We conduct model interpretability post-hoc analysis to understand failure points and evaluate implications for socio-economic disparities. Finally, we conduct a clinician review of the top statistically-correlated, and machine learning-based features to determine whether these features have expected associations with outcomes and their use in current clinical practice. Results: Statistical analysis revealed expected and unexpected correlations between EHR features and outcomes, as well as associations between SDoH and adverse events. Machine learning models achieved the best performance for predicting gestational diabetes (AUROC=0.848) and miscarriage (AUROC=0.864) two weeks before diagnosis, with performance being sensitive to temporal resolution. We found that feature importance was correlated with feature availability and statistical significance. Fairness evaluations revealed disparities in model performance across SDoH subgroups, but minimal disparities across age or race subgroups. Clinician assessment identified features that were unexpected and/or not currently used in clinical practice Conclusion: This study demonstrates the potential of the "All of Us" dataset for understanding and predicting adverse maternal health outcomes. We demonstrate that meaningful populational level pattern can be extracted, and high-performing machine learning models can be trained on this diverse, multi-site dataset, enabled by rigorous data preprocessing and problem formulation methods. Most important features identified through correlation analysis align with known clinical risk factors, reinforcing the robustness of the dataset and our preprocessing pipeline. Notably, we also discovered some unexpected associations that warrant further clinical investigation. While machine learning models successfully leverage many of these relevant features, they also rely on some with limited or no apparent correlation to outcomes. These findings, if validated, could inform new strategies for maternal care. However, performance disparities across social determinants of health (SDoH) subgroups underscore the need for continued research and careful ethical consideration prior to clinical deployment.

Version published to 10.21203/rs.3.rs-7105527/v1 on Research Square
Nov 25, 2025

Comparative Machine Learning Models for Early Prediction of Preterm Birth from Maternal Serum Biomarkers

This article has 7 authors:
1. Kaleem Maqsood
2. Javeria Malik
3. Mahnoor Fatima
4. Sundas Akram
5. Husna Ahmad
6. Nabila Roohi
7. Shahid Bashir
This article has no evaluationsLatest version Dec 16, 2025
Interpretable Machine Learning Models for Childhood and Adolescent Obesity Prediction According to Chinese and WHO Standards: A Two-Wave Cross-Sectional Study

This article has 7 authors:
1. Fangjieyi Zheng
2. Xiaoqian Wang
3. Mei Xue
4. Qiong Wang
5. Wenqian Zhang
6. Zhixin Zhang
7. Wenquan Niu
This article has no evaluationsLatest version Feb 4, 2026
Practical Considerations for using Social Determinants of Health for Disease Prediction in All of Us

This article has 13 authors:
1. Sara Cromer
2. Micah Hysong
3. Alisa Manning
4. Michael Green
5. Iain Konigsberg
6. Luciana Vargas
7. Megan Shuey
8. Leslie Lange
9. Jayati Sharma
10. LaShaunta Glover
11. Genevieve Wojcik
12. Sandra Lee
13. Laura Raffield
This article has no evaluationsLatest version Jan 13, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Comparative Machine Learning Models for Early Prediction of Preterm Birth from Maternal Serum Biomarkers

Interpretable Machine Learning Models for Childhood and Adolescent Obesity Prediction According to Chinese and WHO Standards: A Two-Wave Cross-Sectional Study

Practical Considerations for using Social Determinants of Health for Disease Prediction in All of Us