Machine Learning Techniques for Predicting Brain Stroke Risk: Addressing Data Imbalance

Heshan Chandeepa Pathmakumara
Kavishka Thathsarani Rajapaksha

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Stroke represents a significant global health concern, markedly influencing morbidity and mortality rates across the world. This research investigates the utilization of various machine learning algorithms to forecast stroke risk, with a specific emphasis on tackling the widespread issue of class imbalance found in stroke prediction datasets. We conduct a thorough assessment of the performance of several models, including Random Forest, K-Nearest Neighbors (K-NN), Naive Bayes, Decision Trees, Support Vector Machines (SVM), and Logistic Regression, analyzing both the original imbalanced dataset and a balanced dataset created using the Synthetic Minority Over-sampling Technique (SMOTE). Our results indicate that although the initial accuracy of the models on the imbalanced dataset was substantial, the implementation of SMOTE provided a more accurate evaluation of model efficacy, with Random Forest attaining the highest accuracy at 92%. This study highlights the critical role of class balancing methods such as SMOTE in improving the predictive capabilities of machine learning models within medical applications. The findings of this research carry considerable implications, as enhanced stroke prediction models can facilitate better patient outcomes and optimize healthcare resource allocation through timely interventions.

Version published to 10.31235/osf.io/xv3k8_v1 on OSF Preprints
Oct 10, 2025

Benchmarking Ensemble Machine Learning Algorithms for the Early Prediction of Stroke in Imbalanced Clinical Cohorts: A Comparative Analysis and Decision Curve Assessment

This article has 2 authors:
1. Ibrahim Ibrahim Shuaibu
2. Yousaf Hussain
This article has no evaluationsLatest version Jan 22, 2026
Heart Disease Detection with Machine Learning Algorithms

This article has 2 authors:
1. Fatemeh Hosseinabadi
2. Seyedhassan Sharifi
This article has no evaluationsLatest version Jan 6, 2026
Comparing Algorithm Effectiveness in Health Data Analysis

This article has 1 author:
1. Abdulmalik Hazaa Alshammari
This article has no evaluationsLatest version Jan 22, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Benchmarking Ensemble Machine Learning Algorithms for the Early Prediction of Stroke in Imbalanced Clinical Cohorts: A Comparative Analysis and Decision Curve Assessment

Heart Disease Detection with Machine Learning Algorithms

Comparing Algorithm Effectiveness in Health Data Analysis