Early Prediction of Gestational Diabetes Using Integrated Cell-free DNA Features and Omics-derived Genetic Scores

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Gestational diabetes mellitus (GDM) affects 15.6% of pregnancies globally, with Vietnam exhibiting one of the highest prevalences at 21%. Current diagnostic approaches at 24-28 weeks limit early intervention opportunities. We developed a multi-modal machine learning framework integrating cell-free DNA (cfDNA) structural features and genetic information for early GDM prediction at 10-12 weeks of gestation in Vietnamese women.

Methods

We analyzed blood samples from 1,086 pregnant women (435 GDM cases, 651 controls) collected at 9-12 weeks. Two parallel analytical pathways were employed: cfDNA profiling extracting cfDNA-specific features (fragment length, end motifs, GC content, nucleosome patterns), and whole-genome imputation generating predictions for ∼19,000 omics traits. Component scores were developed using TabPFN classifier and integrated via logistic regression into a unified master score.

Results

Genome-wide analysis identified five omics traits with significant GDM associations: HSD11B1 , NEK7 , COMMD10 , KLRC4 , and OCEL1 . Component score optimization revealed distinct patterns—cfDNA scores peaked at 200 features (AUC=71.53), while genetics-based scores improved with up to 2,000 omics traits (AUC=77.21). The final master score, integrating three components (gbSC 2000 , gbSC BH , cfSC200), achieved AUCs of 86.82 - 87.19 across validation cohorts with 70% sensitivity and 89% specificity. Addition-deletion analysis confirmed that both cfDNA and genetic components provided essential, non-redundant contributions.

Conclusions

This multi-modal framework demonstrates superior performance compared to single-biomarker approaches, enabling risk stratification from very low (4% GDM prevalence) to very high risk (90% prevalence). At the cutoff 0.4, the model identifies 78% of future GDM cases at 10-12 weeks while maintaining an 18% false-positive rate, potentially enabling early interventions to prevent GDM development and associated complications.

Article activity feed