Exploration and Analysis of Risk Factors for Coronary Artery Disease with Type 2 Diabetes Based on SHAP Explainable Machine Learning Algorithm
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Type 2 diabetes (T2DM) significantly elevates coronary heart disease (CHD) risk. This study leverages interpretable machine learning (ML) to identify risk factors for CHD with T2DM, enhancing clinical decision-making. Methods: Clinical data from 5,681 cardiovascular patients (4,396 CHD; 1,285 CHD+T2DM) hospitalized between 2001-2018 were analyzed. The SMOTENC algorithm addressed dataset imbalance. Predictive variables were selected via univariate analysis and Lasso regression. Five ML models (logistic regression, Lasso regression, KNN, SVM, XGBoost) were developed and validated using accuracy, sensitivity, specificity, ROC, and decision curve analysis. SHAP values interpreted model outputs. Results: Data were split into training (n=3,977) and validation (n=1,704) sets. Lasso regression identified 25 predictive variables. XGBoost achieved superior performance (highest accuracy: 0.89; AUC: 0.93) and net benefit in decision curves. SHAP analysis revealed diabetes duration, blood glucose (BG), prothrombin time (PT), and glycated hemoglobin (HbA1c) as primary risk factors. Positive urine glucose and elevated low-density lipoprotein also contributed significantly. Conclusion: Diabetes history, BG, HbA1c, and PT are critical risk factors for CHD-T2DM comorbidity. Prioritizing monitoring of these parameters and implementing targeted interventions may mitigate risk. The XGBoost-SHAP framework provides an interpretable tool for clinical risk stratification.