Comprehensive Analysis of Machine Learning Algorithms for Predicting Heart Disease Across Genders: A Balanced Approach to Model Selection and Performance Evaluation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Heart disease affects men and women differently in terms of symptoms, risk factors, and recovery, and it remains the leading cause of death worldwide. This study compares heart attack characteristics between genders using machine learning approaches. To avoid bias and ensure equitable model evaluation, a well-balanced dataset that includes lifestyle factors, clinical records, and demographic information is essential. Methods Numerous machine learning models, including Random Forest, Decision Trees, Support Vector Machines (SVM), and Logistic Regression, were evaluated. Trying out several models aids in identifying the best method for heart disease prediction. Performance metrics such as accuracy, precision, recall, F1 score, and the AUC-ROC curve were employed to evaluate the effectiveness of the models. Results The results demonstrated that the female dataset performed better than the male dataset across all models, particularly in K-Nearest Neighbour, Naïve Bayes, and Logistic Regression. The male dataset exhibited poorer accuracy, especially in Naïve Bayes and Extreme Gradient Boost. The StackingCVClassifier, which combines several models, improved predictive accuracy, achieving 92.31% accuracy for the female dataset compared to 82.76% for the male dataset, with fewer misclassified samples. Conclusions The female dataset is a more reliable model for predicting heart disease, demonstrating higher accuracy and fewer misclassified samples. The male dataset requires further optimization, particularly in models like Naïve Bayes and Extreme Gradient Boost. Combining multiple models through the StackingCVClassifier enhances predictive accuracy, highlighting the importance of leveraging individual model strengths.