DiaHealth: Early Prediction of Type-2 Diabetes with Associated Risk Factors Using Machine Learning and Explainable AI

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Diabetes mellitus is a rapidly increasing global health concern with severe medical and socio-economic consequences, particularly in developing nations. Although machine learning based solutions offer potential for early diagnosis, existing systems often lack accuracy, validation in external datasets, or relevance to specific local populations. This study aims to address these limitations by developing a highly accurate and explainable diabetes prediction framework and a clinically validated dataset. Methods: We developed a novel framework for Type-2 diabetes prediction by creating and analyzing a new, clinically validated dataset of real-world health records from respondents across Bangladesh. A hybrid feature selection method was implemented that combines correlation feature selection and forest panelized attributes (CFS-FPA) to identify the most predictive risk factors. A fine-tuned Random Forest classifier was trained on this data, and its performance was validated and benchmarked against two public datasets (PIMA and Ranchi). To ensure clinical utility and transparency, we integrated Explainable AI techniques (SHAP and LIME) to provide local and global explanations for the model's predictions. Results: The classification model achieved exceptional predictive performance on our dataset, with an accuracy of 98.26%. The model demonstrated strong robustness and generalizability on external datasets, achieving 85.5% accuracy on the PIMA dataset and 91.7% on the Ranchi dataset. The explainability analysis successfully identified key risk factors and provided transparent and actionable insights into the rationale behind each prediction. Conclusions: This study successfully developed and validated a dataset and a framework for a highly accurate and interpretable machine learning framework for early prediction of diabetes. Using a novel, locally relevant dataset, our approach provides a transparent and reliable tool that can help clinicians make informed decisions, having significant potential to improve health outcomes in resource-limited settings like Bangladesh.

Article activity feed