Machine Learning-Based Models for Imputing Missing Values of Key Predictors of Important Clinical Outcomes in a Large-Scale Registry of Acute Myocardial Infarction

Haoyun Hong
Remy Poudel
Asishana Osho
Kathie Thomas
Shen Li
Abhinav Goyal
Jennifer L. Hall
Juan Zhao

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Missing data is a frequent limitation of database research. Application of machine learning-based models is a promising approach to overcome this limitation. This study aims to evaluate methods for imputing three critical but sometimes missing predictors of important clinical outcomes (e.g. mortality) in hospitalized patients presenting with acute myocardial infarction (AMI): cardiac arrest prior to arrival, heart failure on first medical contact (FMC), and cardiogenic shock on FMC.

Methods

The study included 219,146 individuals participating in the American Heart Association Get With The Guidelines® Coronary Artery Disease (GWTG-CAD) Registry from 2020 to 2022. We trained machine learning models along with logistic regression model to predict cardiac arrest prior to arrival, heart failure on FMC and cardiogenic shock on FMC through a nested cross-validation with a calibration layer. Important features are identified through Shapley additive explanations (SHAP) and model performance are evaluated on different demographic subgroups.

Results

The mean (SD) age was 65 (18) years, with 32.7% Female; 71.2% individuals were White, and 13.5% individuals were Black. For cardiac arrest, XGBoost and LightGBM achieved AUROC scores of 0.920 (95% CI: 0.915 – 0.924) and 0.922 (95% CI: 0.918 – 0.927). For cardiogenic shock, XGBoost and LightGBM achieved AUROC scores of 0.912 (95% CI, 0.907 - 0.918) and 0.913 (95% CI, 0.908 - 0.918). For heart failure, XGBoost and LightGBM achieved AUROC scores of 0.824 (95% CI, 0.818 - 0.829) and 0.826 (95% CI, 0.818 - 0.833). Vital signs, lab values, age, and AMI type are among the most influential features for predicting these clinical presenting variables.

Conclusions

We demonstrated machine learning techniques, particularly XGBoost and LightGBM, have proven effective in imputing missing clinical data in cardiovascular studies. The models improve the data power and quality of datasets by accurately imputing major cardiac events, with implications for enhancing clinical and research practices.

Version published to 10.1101/2025.05.29.25328603v1 on medRxiv
May 31, 2025

Short-term and long-term outcome prediction for patients with coronary artery disease using machine learning and comprehensive multi-center patient data

This article has 10 authors:
1. Emma Bogner
2. Bryan Har
3. Bing Li
4. Danielle Southern
5. Christopher Liang Feng Sun
6. Robert C Welsh
7. Benjamin Tyrrell
8. Colm J Murphy
9. Arjun Puri
10. Joon Lee
This article has no evaluationsLatest version May 27, 2025
Accurate machine learning-based CVD risk prediction in primary care may reduce the need for routine health care checks

This article has 10 authors:
1. Katarzyna Dziopa
2. Sophie Eastwood
3. Daniel Bos
4. Maryam Kavousi
5. Maarten J.G. Leening
6. Joline W J Beulens
7. Peter P Harms
8. Nishi Chaturvedi
9. Folkert W Asselbergs
10. Amand Floriaan Schmidt
This article has no evaluationsLatest version Jun 13, 2025
Retrospective Machine Learning Approach for Forecasting In-Hospital Death in ICU Patients After Cardiac Arrest

This article has 8 authors:
1. Yong Si
2. Li Sun
3. Shuheng Chen
4. JunYi Fan
5. Elham Pishgar
6. Kamiar Alaei
7. Greg Placencia
8. Maryam Pishgar
This article has no evaluationsLatest version May 6, 2025

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

Short-term and long-term outcome prediction for patients with coronary artery disease using machine learning and comprehensive multi-center patient data

Accurate machine learning-based CVD risk prediction in primary care may reduce the need for routine health care checks

Retrospective Machine Learning Approach for Forecasting In-Hospital Death in ICU Patients After Cardiac Arrest