CATBoost-Based Multilingual System for Predicting Type 2 Diabetes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Predictive Systems have demonstrated potential in predicting Type 2 diabetes (T2D), yet they face various limitations that impact prediction reliability and accessibility. Previous works have not sufficiently addressed incorporating multilingual capacities, such as the Yorùbá language, or utilising loc al datasets in developing these systems. This study is aimed at addressing those named problems by creating a multilingual predictive system for T2D patients, leveraging the CATBoost machine learning algorithm to enhance prediction accuracy and inclusivity. This study employed datasets from several hospitals and a community in Ogbomoso and Akure, totalling 1,197 records, and examined 13 risk factors. Four machine learning algorithms which include Decision Tree, Logistic Regression, Naïve Bayes and CATBoost were employed for non-invasive and invasive methods. The invasive method refers to the development of a model with the inclusion of blood glucose measurement while the non-invasive method develops a model with external factors like age, blood pressure, and lifestyle data. The system was implemented in both English and Yorùbá languages. Evaluation metrics included accuracy, MCC, AUC, recall, Kappa, precision and F1-Score. The two methods were compared using a paired sample t-test and Wilcoxon signed-ranked test. For the non-invasive methods, CATBoost achieved an accuracy of 90.60%, an AUC of 0.9032, a recall of 0.6591, a precision of 0.9073, an F1-Score of 0.7622, a Kappa of 0.7054, and MCC of 0.7203. for the invasive method, CATBoost achieved an accuracy of 97.57%, an AUC of 0.9865, a recall of 0.9789, a precision of 0.9798, an F1-Score of 0.9789, a Kappa of 0.9503, and an MCC of 0.951. This study developed a Predictive System for early prediction of Type 2 diabetes. The system is applicable for diabetes screening in both English and Yorùbá.