Evaluating the Performance of Ensemble Learning Methods in Diabetes Disease Classification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Diabetes mellitus is a prevalent metabolic disorder characterized by chronic hyperglycemia and associated with severe complications. Accurate early detection is essential for effective management and prevention of disease progression. This study systematically evaluates the performance of three ensemble learning approaches Bagging, Boosting, and Stacking on three benchmark diabetes datasets: Pima Indians Diabetes, Frankfurt Hospital Diabetes, and Sylhet Hospital Diabetes (NIDDK). Class imbalance, a common challenge in these datasets, was addressed using the Synthetic Minority Oversampling Technique (SMOTE) during preprocessing to enhance model stability and classification reliability. Experimental results indicate that Boosting-based methods consistently outperform Bagging and Stacking. On the Pima dataset, Gradient Boosting, Extreme Gradient Boosting, and CatBoost achieved a maximum accuracy of 81.82%. On the Frankfurt dataset, Light Gradient Boosting reached 99.25% accuracy, while on the NIDDK dataset, Light Gradient Boosting and CatBoost attained perfect accuracy (100%). These findings highlight the effectiveness of integrating SMOTE with Boosting-based ensemble models to mitigate class imbalance and improve diabetes classification. The results underscore the importance of both data preprocessing and algorithm selection in achieving high predictive performance, with significant implications for precision medicine and clinical decision support.

Article activity feed