Machine Learning Techniques for Predicting Brain Stroke Risk: Addressing Data Imbalance
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Stroke represents a significant global health concern, markedly influencing morbidity and mortality rates across the world. This research investigates the utilization of various machine learning algorithms to forecast stroke risk, with a specific emphasis on tackling the widespread issue of class imbalance found in stroke prediction datasets. We conduct a thorough assessment of the performance of several models, including Random Forest, K-Nearest Neighbors (K-NN), Naive Bayes, Decision Trees, Support Vector Machines (SVM), and Logistic Regression, analyzing both the original imbalanced dataset and a balanced dataset created using the Synthetic Minority Over-sampling Technique (SMOTE). Our results indicate that although the initial accuracy of the models on the imbalanced dataset was substantial, the implementation of SMOTE provided a more accurate evaluation of model efficacy, with Random Forest attaining the highest accuracy at 92%. This study highlights the critical role of class balancing methods such as SMOTE in improving the predictive capabilities of machine learning models within medical applications. The findings of this research carry considerable implications, as enhanced stroke prediction models can facilitate better patient outcomes and optimize healthcare resource allocation through timely interventions.