Explainable AI Based Coronary Heart Disease Prediction: Enhancing Model Transparency in Clinical Decision Making
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Introduction: Coronary heart disease (CHD) is still a major cause of death globally, and hence early detection and risk stratification are necessary to avoid major cardiovascular events. The present study uses clinical and demographic characteristics to compare the predictive accuracy of eight machine learning models for CHD diagnosis. It also investigates the contribution and direction of influence of the most important features in the models to improve interpretability.
Methods
We contrasted the predictive accuracy of eight different machine learning models for CHD classification. The work identifies the most important features from the top-performing models and applies SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) to gain insight into how every feature affects the model’s prediction. These interpretive methods assist in displaying the direction and amount of feature contributions to allow transparency in AI-based CHD risk prediction.
Results
XGboost and Random Forest achieved the highest testing accuracies 0.839 and 0.805, with training accuracies of 0.901 and 0.957 respectively showing ideal model of XGboost and significant overfitting for random forest model. ECG-associated features, such as resting ECG and old peak (ST depression through workout), also place favorably, supporting the significance of cardiac electrical activity in diagnosis. ST slope has the highest impact, followed by Chest pain type and old peak, which increase the likelihood of heart disease with their high values contributing positively to the prediction. Resting bps, sex, and fasting blood sugar have lower impacts on the model’s predictions.
Conclusions
In conclusion, machine learning models, particularly XGboost and random forest, show substantial predictive accuracy for coronary heart disease, with testing AUROCs of 0.885. Feature importance, SHAP and LIME analysis highlight the critical role of ECG-derived metrics like ST slope, chest pain type, and resting ecp while traditional risk factors such as cholesterol, resting bps, and fasting blood sugar have less influence.