Short-term and long-term outcome prediction for patients with coronary artery disease using machine learning and comprehensive multi-center patient data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Revascularization decision-making for patients with coronary artery disease (CAD) can benefit from accurate patient outcome prediction. While previous studies have employed data-driven methods including machine learning (ML) to develop prediction models, they were mostly based on small patient cohorts with strict inclusion and exclusion criteria, limited feature sets, and only internal validation. Objectives To develop and externally validate ML-based models to predict a wide range of short- and long-term outcomes for patients with obstructive CAD using large-scale multi-center patient data. Methods Comprehensive data from patients with obstructive CAD who underwent coronary angiography at three hospitals in Alberta, Canada between 2009 and 2019 were extracted from the APPROACH Registry and linked administrative health databases. To predict all-cause mortality and major adverse cardiovascular events at 90 days, 1 year, 3 years, and 5 years, over 12,000 features were considered in an extensive ML framework that employed rigorous hyperparameter tuning, calibration, algorithmic bias assessment, and external validation. In addition to traditional ML models, we employed a generative transformer-based tabular foundation model, TabPFN. To increase the clinical utility of these prediction models, we also performed a secondary analysis that investigated the impact of the exclusion of angiography data on prediction performance. Results A total of 44,462 catheterizations from 38,767 unique patients were included in the study. The median areas under the receiver operating characteristic curves of the best models, mostly TabPFNs, in external validation ranged from 0.797 to 0.845 and 0.694 to 0.753 for mortality and MACE, respectively. CAD factors, angiography results, and patient history were the most influential feature groups. The algorithmic bias assessment focusing on patient sex showed that the models were mostly fair. The secondary analysis showed that prediction performance degraded slightly when angiography features were excluded. Conclusions The prediction performance reported in this study is state-of-the-art compared to previous studies. The large sample size, extensive feature set, external validation, and transformer architecture led to personalized models with robust performance. The models from this study have the potential to improve coronary revascularization decision-making and patient outcomes via accurate prognosis.