Machine Learning–Based Prediction of Heart Disease Using Logistic Regression, Support Vector Machine, and Random Forest Classifier

Soobia Saeed

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Heart disease is still at the top of the list of causes of deaths around the globe, which shows that there is a great need for early and accurate diagnostic methods that will aid clinical decision-making. A machine learning–based predictive system for heart disease will be developed and evaluated in this project using a real-world Heart Failure Prediction dataset that contains 918 anonymized patient records and 11 clinical attributes. As part of data preprocessing, medically impossible values were identified and treated, invalid cholesterol readings were replaced with the median, non-sensical entries were removed, categorical variables were encoded, and feature standardization was done to ready the dataset for model training. Accordingly, Logistic Regression, Support Vector Machine (SVM) with an RBF kernel, and Random Forest were three supervised learning algorithms implemented to evaluate their performances in binary classification. To guarantee data quality and model trustworthiness, Exploratory Data Analysis (EDA) and cross-validation were done. Model performance evaluation included the use of accuracy, precision, recall, F1-score, confusion matrices, and ROC–AUC metrics. The results indicate that the Random Forest classifier produced the best overall performance with an accuracy of 87.50%, precision of 91.59%, recall of 87.50%, F1-score of 89.50%, and an AUC of 0.9391, thus beating both SVM and Logistic Regression. Though Logistic Regression gave a comprehensible baseline, its greater false-negative rate made it less suitable for high-risk clinical applications. SVM displayed excellent non-linear classification power but needed more computational tuning. Taken together, these results show that Random Forest is the most dependable and robust model for heart disease prediction with this dataset. The next step should be incorporating wider lifestyle factors, using improved data collection methods, sophisticated outlier handling, additional machine learning models, and possibly deployment as a clinical decision-support tool through web or mobile applications.

Version published to 10.20944/preprints202511.1985.v1
Nov 25, 2025

Heart Disease Detection with Machine Learning Algorithms

This article has 2 authors:
1. Fatemeh Hosseinabadi
2. Seyedhassan Sharifi
This article has no evaluationsLatest version Jan 6, 2026
Machine Learning Insights for Cardiovascular Risk Prediction in Diabetic Patients: Emphasis on Renal and Cardiac Markers Using Random Forests

This article has 1 author:
1. Julian Borges
This article has no evaluationsLatest version Jan 21, 2026
Machine Learning Models in Classifying, Predicting and Managing COVID-19 Severity

This article has 10 authors:
1. Larysa Sydorchuk
2. Maksym Sokolenko
3. Miroslav Škoda
4. Denys Nevinskyi
5. Yaroslav Vyklyuk
6. Ruslan Sydorchuk
7. Alina Sokolenko
8. Ludmila Sokolenko
9. Andrii Sydorchuk
10. Oleksandr Sokolenko
This article has no evaluationsLatest version Jan 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Heart Disease Detection with Machine Learning Algorithms

Machine Learning Insights for Cardiovascular Risk Prediction in Diabetic Patients: Emphasis on Renal and Cardiac Markers Using Random Forests

Machine Learning Models in Classifying, Predicting and Managing COVID-19 Severity