Enhanced machine learning and hybrid ensemble approaches for coronary heart disease prediction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Introduction: Coronary heart disease (CHD) remains the leading cause of mortality worldwide, disproportionately affecting low- and middle-income countries where diagnostic resources are limited. Traditional statistical models often fail to deliver adequate predictive accuracy in complex, high-dimensional, and imbalanced health datasets. Objective: To develop and evaluate enhanced machine learning and hybrid ensemble models for the prediction of coronary heart disease, with a focus on improving diagnostic performance, interpretability, and applicability in resource-constrained settings. Methods: We utilized a nationally representative dataset of 253,680 individuals from the Behavioral Risk Factor Surveillance System. Preprocessing included normalization and balancing via the Synthetic Minority Oversampling Technique (SMOTE). Baseline models—Decision Trees, Random Forests, Gradient Boosting, and Support Vector Machines—were compared against improved versions: Adaptive Noise–Resistant Decision Tree (ADNRT), Hybrid Imbalanced Random Forest (HIRF), Pruned Gradient Boosting Machine (PGBM), and Enhanced Support Vector Machine (ESVM). Ensemble approaches (stacking, boosting, bagging, Bayesian model averaging and majority voting) were implemented and evaluated using accuracy, sensitivity, specificity, and area under the curve (AUC). Calibration and learning curves were also analyzed. Results: Enhanced models consistently outperformed their baseline counterparts. PGBM achieved the highest sensitivity (90.8%), while HIRF demonstrated the best overall calibration and balance (AUC = 0.937; sensitivity = 88.4%; specificity = 82.9%). The stacking ensemble emerged as the best-performing model with an accuracy of 87.2%, sensitivity of 89.6%, specificity of 84.7%, and AUC of 0.94. Calibration and learning curve analyses confirmed strong generalizability and low overfitting across ensemble models. Conclusion: Hybrid ensemble machine learning models significantly outperform traditional classifiers in CHD prediction, offering high accuracy, robustness, and interpretability. These models present a scalable framework for implementing AI-driven diagnostic tools in low–resource environments, potentially transforming early detection and prevention of coronary heart disease.