Evaluating the Performance of Ensemble Learning Methods in Diabetes Disease Classification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Diabetes mellitus is a prevalent metabolic disorder characterized by chronic hyperglycemia and associated with severe complications. Accurate early detection is essential for effective management and prevention of disease progression. This study systematically evaluates the performance of three ensemble learning approaches Bagging, Boosting, and Stacking on three benchmark diabetes datasets: Pima Indians Diabetes, Frankfurt Hospital Diabetes, and Sylhet Hospital Diabetes (NIDDK). Class imbalance, a common challenge in these datasets, was addressed using the Synthetic Minority Oversampling Technique (SMOTE) during preprocessing to enhance model stability and classification reliability. Experimental results indicate that Boosting-based methods consistently outperform Bagging and Stacking. On the Pima dataset, Gradient Boosting, Extreme Gradient Boosting, and CatBoost achieved a maximum accuracy of 81.82%. On the Frankfurt dataset, Light Gradient Boosting reached 99.25% accuracy, while on the NIDDK dataset, Light Gradient Boosting and CatBoost attained perfect accuracy (100%). These findings highlight the effectiveness of integrating SMOTE with Boosting-based ensemble models to mitigate class imbalance and improve diabetes classification. The results underscore the importance of both data preprocessing and algorithm selection in achieving high predictive performance, with significant implications for precision medicine and clinical decision support.