Predicting Near-term Mortality in Heart Failure: External Validation of Electronic Health Record-Based Deep Learning Model

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

The dire consequences of heart failure (HF) patient non-response to guideline directed medical therapy often fuel early, non-selective referral for surgical intervention (ventricular assist device [VAD] or transplant). The high-risk associated with these interventions mandates precision in directing them only toward those patients who would otherwise suffer severe near-term deterioration. We previously reported a 52,265-patient deep learning model that predicted 1-year severe decompensation/death in HF inpatients, with a C-statistic of 0.91. We now present external model validation. Few groups applying deep learning to large-scale datasets have achieved external validation using equally large-scale independent datasets, yet proof of generalization is essential to practical applicability.

Methods

Our previous study used standard electronic health record (EHR) data to build ensemble deep learning models employing time-series and densely connected networks. The positive-class included both all-cause mortality and referral for HF surgical intervention within 1 year. In the current study, we assessed generalization of model architecture in an external validation test set from the Veterans Cardiac Health and Artificial Intelligence Model Predictions (V-CHAMPS) challenge, a synthetic national governmental sample using a distinct EHR system. While V-CHAMPS is a robust dataset, variables that capture VAD/transplant referral were not readily extracted, limiting the positive-class to mortality only.

Results

A total of 380,441 distinct admissions from 75,086 HF patients contributed >720 million EHR datapoints. 23% of observations fit positive-class criteria. The model C-statistic in the external-validation cohort was 0.79.

Conclusions

Despite being developed in a single-center dataset with a more precise positive-class, our model architecture maintained relative accuracy when applied to a national sample in an unrelated EHR system. This supports clinical relevancy of the deep-learning model and adaptability with retraining to disparate contexts. This broad applicability suggests considerable potential of EHR-based deep learning models to assist HF clinicians in improving the usage of advanced surgical therapy.

Article activity feed