Unified approach for Accurate Heart Disease Prediction using Machine Learning Techniques
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Cardiovascular diseases (CVDs) account for a large share of worldwide morbidity, disability, and premature mortality, posing a critical challenge to public health. The risk and severity of these conditions can be greatly reduced by adopting early identification and proactive treatment strategies. As part of this effort, the main focus has been to estimate the probability that an individual will experience major cardiovascular events. Machine learning offers a promising alternative to conventional risk models, enhancing the accuracy of health outcome predictions. A machine learning pipeline that can predict heart disease using the XGBoost algorithm, advanced feature selection techniques, and automated hyperparameter tuning with Optuna is presented in this research. Initially, important features were derived using XGBoost-based importance scores, which improved model interpretability and reduced dimensionality. Optuna's Tree-structured Parzen Estimator (TPE) sampler was used to efficiently optimize the classification model by exploring the hyperparameter space. To tackle class imbalance, SMOTE was integrated into the pipeline. The final model outperformed the test dataset, proving 99.02% accuracy, 99.813% precision, 100% recall, 99.05% F1-score, and ROC-AUC of 0.9998. The dataset, which was obtained has 1,025 instances from the Cleveland, Hungary, Switzerland, and Long Beach V databases, and each has 14 features. The results highlight that integrating ensemble learning, feature selection, and hyperparameter tuning enhances the reliability of predictive models for cardiovascular disease detection.