Order-Based Bayesian Network Modeling of Early Detection and Post-Diagnosis Control for Cardiovascular Disease Risk in Type 2 Diabetes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Patients diagnosed with type 2 diabetes (T2D) are at increased risk of developing cardiovascular disease (CVD), the leading cause of morbidity and mortality in this population. Early detection and glycemic control within the first year after diagnosis reduce CVD risk. However, gaps remain in how to operationalize early detection of T2D using Electronic Health Record (EHR) data and quantify its relationship with subsequent CVD risk using longitudinal observations. We developed a probabilistic graph model to analyze the interdependencies between early detection of T2D, post-diagnosis glycemic control, and CVD occurrence. Using a temporally structured Bayesian Network (BN) learned from EHR data of 9,450 primary care patients between 2017 and 2023, we quantified probabilistic dependencies between demographics, diagnostic delay surrogates, glycemic control, and post-diagnosis CVD occurrence. Percentile-based thresholds defined risk groups, where individuals with predicted probabilities in the bottom decile (≤ 10 th percentile) were classified as low risk, and those in the top decile (≥ 90 th percentile) as high risk. Results demonstrated heterogeneity in predicted risks across glycemic and cardiovascular outcomes. Predicted probability of developing CVD within the first year after T2D diagnosis ranged from a mean of 5.2% in the low-risk group to 28.9% in the high-risk group, while predicted probabilities of mean Hemoglobin A1c (HbA1c) ≥ 8% during the first year post-diagnosis ranged from 1.6% in low-risk to 55.1% in high-risk group. Patients with HbA1c at diagnosis ≥ 8% had higher predicted probabilities of first-year post-diagnosis mean HbA1c ≥ 8% (53.3% vs. 1.9%) and high HbA1c coefficient of variation (18.7% vs. 3.1%) compared with those with HbA1c ≤ 6.5%. Incorporating early clinical outcomes refined later risk predictions, with long-term CVD risk reaching 33.5% among high-risk individuals. The proposed model achieved predictive performance comparable to conventional machine learning approaches while providing interpretable relationships for risk stratification in primary care populations.
Author summary
People with type 2 diabetes (T2D) are more likely to develop cardiovascular disease (CVD) after becoming diabetic. Having multiple diseases leads to more illness burden and death. Detecting T2D early and managing its symptoms soon after diagnosis can help prevent CVD, but it is difficult to identify these symptoms in data and understand how they relate to each other. In this study, we used healthcare data from primary care patients to explore how early signs of T2D (such as laboratory measurements), and health complications are connected. We developed a data-driven model that connects patient characteristics, clinical observations, and health outcomes. Our findings showed having high blood sugar values at diagnosis worsened how diabetes is controlled later on and put individuals in higher risk of CVD. Particularly, clinical observations during the first year after T2D diagnosis were important to detect if someone will develop CVD later. By connecting data to health outcomes over time, our model may help clinicians identify high-risk individuals earlier and support more personalized T2D management in primary care settings.