Optimized Feature Selection and Advanced Machine Learning for Stroke Risk Prediction in Revascularized Coronary Artery Disease Patients

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Coronary artery disease (CAD) is a leading cause of mortality, with stroke being a major complication following coronary revascularization procedures such as percutaneous coronary intervention (PCI) and coronary artery bypass grafting (CABG). While machine learning (ML) has been used to predict postoperative outcomes, a gap remains in quantifying stroke risk in revascularized CAD patients. This study aims to develop and validate ML models for stroke risk prediction, enhancing clinical decision-making.

Methods

We extracted 5,757 patients from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database and applied Pearson correlation analysis, least absolute shrinkage and selection operator (LASSO), ridge regression, and elastic net for feature selection. The initial 35 features were reduced to 14. The dataset was split into training (70%), testing (15%), and validation (15%). We evaluated multiple ML models, including logistic regression, XGBoost, random forest, AdaBoost, Bernoulli naive Bayes, k-nearest neighbors (KNN), and CatBoost. Model performance was assessed using the area under the receiver operating characteristic curve (AUC-ROC) and 500 bootstrapped 95% confidence intervals (CIs).

Results

CatBoost achieved the highest performance with an AUC of 0.8486 (95% CI: 0.81240.8797) on the test set and 0.8511 (95% CI: 0.82030.8793) on validation. Shapley Additive Explanations (SHAP) identified Charlson Comorbidity Index (CCI), length of stay (LOS), and treatment types as key predictors. Our model outperformed existing studies, improving AUC by 9% while using a more refined feature set.

Conclusions

Integrating multiple feature selection methods, our streamlined model improves efficiency and reliability. The proposed CatBoost model offers a high-accuracy approach to predicting postoperative stroke in CAD patients undergoing revascularization, supporting clinical decision-making and preventive strategies.

Article activity feed