Construction and Validation of an Interpretable Machine Learning Model for Predicting Diabetes Risk in COPD Patients
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective To develop a machine learning (ML)-based prediction model for identifying high-risk diabetic individuals among COPD patients, thereby facilitating early and personalized management of this complication. Methods Data from COPD patients in the MIMIC-IV database were split into training (70%) and validation (30%) sets. LASSO regression and logistic regression were used to screen 49 variables, and six ML algorithms were employed to construct and internally validate the prediction model. Model performance was evaluated using multiple metrics, followed by external validation. Finally, SHAP (SHapley Additive exPlanations) analysis was performed for interpretability. Results All six ML algorithms demonstrated excellent performance in the training, testing, and validation sets, as evidenced by ROC curve analysis, with LightGBM showing the best overall performance. Feature importance analysis revealed that marital status, blood glucose level, and insurance type were the top three factors influencing diabetes development in COPD patients. Conclusion This study developed an interpretable ML-based risk prediction model for diabetes in COPD patients. The model provides clinicians with a novel tool for early personalized intervention, ultimately improving patient prognosis.