Advanced Predictive Modeling of Type 2 Diabetes Using XGBoost and Explainable AI

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The increasing prevalence of Type 2 diabetes (D.M. II) globally poses significant public health challenges, necessitating the development of effective predictive models for accurate prediction. This study aims to apply machine learning (ML) algorithms and explainable artificial intelligence (XAI) techniques to predict the risk of D.M. II using health data from the Dena Cohort in Yasuj, Iran. Data was collected from 3,203 individuals aged 35 to 70, incorporating various demographic, clinical, and lifestyle features. Two ML models, XGBoost and CatBoost, were developed and evaluated for their predictive accuracy. Data preprocessing involved handling missing values, normalization of continuous variables, and addressing class imbalance through the Synthetic Minority Over-sampling Technique (SMOTE). The performance of the models was assessed using accuracy, F1-score, and area under the receiver operating characteristic curve (AUC). The values of SHAP (Shapley Additive Explanations) were utilized to explain model predictions and enhance interpretability. The XGBoost model demonstrated superior performance, achieving an accuracy of 96.07% and an AUC of 99.29%. Key predictive factors identified included fasting blood sugar, fatty liver, urolithiasis, age, and lifestyle factors like energy drink consumption and television watching. The application of SHAP provided valuable insights into the contribution of individual features to the model’s predictions, enhancing transparency and understanding for healthcare professionals. This study's findings highlight the potential of ML and XAI in improving the prediction of D.M. II. By identifying critical risk factors, the developed predictive models can support personalized healthcare interventions, improving patient outcomes and reducing healthcare burdens associated with diabetes. This research advocates integrating advanced predictive analytics into clinical practice to enhance diabetes prediction strategies.

Article activity feed