Machine Learning-Based Models for Imputing Missing Values of Key Predictors of Important Clinical Outcomes in a Large-Scale Registry of Acute Myocardial Infarction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Missing data is a frequent limitation of database research. Application of machine learning-based models is a promising approach to overcome this limitation. This study aims to evaluate methods for imputing three critical but sometimes missing predictors of important clinical outcomes (e.g. mortality) in hospitalized patients presenting with acute myocardial infarction (AMI): cardiac arrest prior to arrival, heart failure on first medical contact (FMC), and cardiogenic shock on FMC.
Methods
The study included 219,146 individuals participating in the American Heart Association Get With The Guidelines® Coronary Artery Disease (GWTG-CAD) Registry from 2020 to 2022. We trained machine learning models along with logistic regression model to predict cardiac arrest prior to arrival, heart failure on FMC and cardiogenic shock on FMC through a nested cross-validation with a calibration layer. Important features are identified through Shapley additive explanations (SHAP) and model performance are evaluated on different demographic subgroups.
Results
The mean (SD) age was 65 (18) years, with 32.7% Female; 71.2% individuals were White, and 13.5% individuals were Black. For cardiac arrest, XGBoost and LightGBM achieved AUROC scores of 0.920 (95% CI: 0.915 – 0.924) and 0.922 (95% CI: 0.918 – 0.927). For cardiogenic shock, XGBoost and LightGBM achieved AUROC scores of 0.912 (95% CI, 0.907 - 0.918) and 0.913 (95% CI, 0.908 - 0.918). For heart failure, XGBoost and LightGBM achieved AUROC scores of 0.824 (95% CI, 0.818 - 0.829) and 0.826 (95% CI, 0.818 - 0.833). Vital signs, lab values, age, and AMI type are among the most influential features for predicting these clinical presenting variables.
Conclusions
We demonstrated machine learning techniques, particularly XGBoost and LightGBM, have proven effective in imputing missing clinical data in cardiovascular studies. The models improve the data power and quality of datasets by accurately imputing major cardiac events, with implications for enhancing clinical and research practices.