An Evaluation of Machine Learning Categories for Diabetes Prediction and Detection in Libya: A Comparative Study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Diabetes Mellitus is a growing global health concern, projected to affect over 1.31 billion people by 2050. Early detection is vital, and machine learning offers a promising tool for predicting and managing the disease. Aim: This study aimed to introduce a structured classification of ML algorithms into three categories and to evaluate their performance in predicting diabetes using locally collected patient data. Methods: A dataset of 806 participants (403 diabetic and 403 non-diabetic) was analyzed using attributes such as sex, age, body mass index, blood glucose, blood pressure, diabetes pedigree function, and number of pregnancies (females only). ML algorithms were grouped into three categories: Simple Computational (Logistic Regression, Naïve Bayes), Tree-based (Random Forests, Gradient Boosted Trees), and Margin-based (Support Vector Machines, Fast Large Margin). Data were partitioned into training, validation, and testing sets using stratified sampling and cross-validation. Performance was assessed using accuracy, error rate, precision, recall, specificity, and F-measure. Results: Tree-based algorithms outperformed other categories, with Gradient Boosted Trees achieving the highest accuracy (97.8%), followed by Random Forests (97.5%). This category also achieved superior specificity, precision, and F-measure. In contrast, Simple Computational algorithms showed the highest sensitivity (Logistic Regression 99.3%, Naïve Bayes 98.8%), effectively identifying true positive cases. Conclusion: The study’s classification framework provides a systematic basis for comparing ML models, highlighting the strengths of each category. It offers a foundation for hybrid approaches that combine high accuracy with strong sensitivity, supporting enhanced diagnostic accuracy and improved clinical decision-making.