Early Prediction of Gestational Diabetes Using Integrated Cell-free DNA Features and Omics-derived Genetic Scores
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Gestational diabetes mellitus (GDM) affects 15.6% of pregnancies globally, with Vietnam exhibiting one of the highest prevalences at 21%. Current diagnostic approaches at 24-28 weeks limit early intervention opportunities. We developed a multi-modal machine learning framework integrating cell-free DNA (cfDNA) structural features and genetic information for early GDM prediction at 10-12 weeks of gestation in Vietnamese women.
Methods
We analyzed blood samples from 1,086 pregnant women (435 GDM cases, 651 controls) collected at 9-12 weeks. Two parallel analytical pathways were employed: cfDNA profiling extracting cfDNA-specific features (fragment length, end motifs, GC content, nucleosome patterns), and whole-genome imputation generating predictions for ∼19,000 omics traits. Component scores were developed using TabPFN classifier and integrated via logistic regression into a unified master score.
Results
Genome-wide analysis identified five omics traits with significant GDM associations: HSD11B1 , NEK7 , COMMD10 , KLRC4 , and OCEL1 . Component score optimization revealed distinct patterns—cfDNA scores peaked at 200 features (AUC=71.53), while genetics-based scores improved with up to 2,000 omics traits (AUC=77.21). The final master score, integrating three components (gbSC 2000 , gbSC BH , cfSC200), achieved AUCs of 86.82 - 87.19 across validation cohorts with 70% sensitivity and 89% specificity. Addition-deletion analysis confirmed that both cfDNA and genetic components provided essential, non-redundant contributions.
Conclusions
This multi-modal framework demonstrates superior performance compared to single-biomarker approaches, enabling risk stratification from very low (4% GDM prevalence) to very high risk (90% prevalence). At the cutoff 0.4, the model identifies 78% of future GDM cases at 10-12 weeks while maintaining an 18% false-positive rate, potentially enabling early interventions to prevent GDM development and associated complications.