Development and Validation of a parsimonious AI-Based Risk Score for Mortality in Heart Failure: A UK cohort study

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Accurate risk stratification in heart failure (HF) is crucial to guide clinical decisions, optimise therapeutic strategies and inform resource allocation. Existing widely used tools like from the Meta-Analysis Global Group in Chronic Heart Failure (MAGGIC) show modest discriminatory performance and rely on specialised tests (e.g., echocardiography), while advanced Artificial Intelligence (AI) models, though more accurate, face significant barriers to adoption and integration into routine care pathways due to their complexity and data access issues. By leveraging insights from a complex AI model and feature engineering, we aim to develop and validate a both, parsimonious and high-performing, AI-based risk model for all-cause mortality in HF patients, specifically designed for easy interpretation and accessibility in a clinical setting.

Methods

Using a cohort of 373,389 patients with HF (≥18 years; 1,153 English practices for derivation, 289 for validation) from the Clinical Practice Research Datalink (CPRD) Aurum dataset, we developed and validated an AI-based risk prediction model for all-cause mortality. An initial Multi-layer Perceptron (MLP) model with a survival framework was trained with variables from the MAGGIC score. This MLP model was then enhanced by incorporating highly predictive by incorporating highly predictive comorbidity features identified through an explainable Transformer model trained on longitudinal electronic health record (EHR) data and subsequently distilled through a novel feature engineering approach to derive a final, optimised 11-variable model (named ‘SIMPLE-HF’ - S implified I ntelligent M ortality P rediction for L ongitudinal E HRs in HF patients). The resulting model utilised readily available variables at the point-of-care, including age, BMI, year of birth (as a proxy for birth cohort effects), and key comorbidities such as cancers. We examined the discrimination and calibration of SIMPLE-HF on the validation dataset and compared performance against a MAGGIC score adapted for use on routine EHR (termed as MAGGIC-EHR). In secondary analyses, we investigated the performance of SIMPLE-HF for the prediction of major adverse cardiovascular events and hospitalisation.

Findings

The SIMPLE-HF model demonstrated significantly improved discriminatory performance (C-index: 0.801, 95% CI [0.795, 0.806]) compared to the benchmark MAGGIC-EHR Cox model (0.735, [0.728, 0.741]), while maintaining acceptable calibration. Clinical impact analysis further revealed that SIMPLE-HF consistently captures more true events while flagging fewer patients as high-risk. For every 1000 patients screened, at a high-risk threshold of 0.60, SIMPLE-HF identifies 199 patients who will have an event, whereas MAGGIC-EHR captures only 99 true events. Similarly, SIMPLE-HF achieved improved performance on cardiovascular events and hospitalisation prognostication as compared to the benchmark model.

Interpretation

Leveraging insights from a complex EHR-trained AI model into easily collected clinical features, we enhance the accuracy and practicality of HF mortality risk prediction. This novel approach offers a pathway to clinically implementable tools that balance predictive accuracy with pragmatic utility and has the potential to improve HF risk stratification, along with offering significant cost efficiency by removing the need of specialised tests.

Research in context

Evidence before this study

Before this study, the risk stratification of patients with heart failure (HF) presented a clear challenge. Existing clinical risk scores, such as the widely known MAGGIC score, demonstrate only modest predictive accuracy (typically with C-indices between 0.61 and 0.75) and often depend on variables from specialised tests like echocardiography, which are not always available in routine care, particularly in primary care settings. This limits their practical utility. In contrast, advanced artificial intelligence (AI) models, such as those using Transformer architectures trained on extensive longitudinal electronic health records, have shown superior predictive performance (C-indices > 0.80). However, their complexity, reliance on vast datasets, and significant computational requirements create major barriers to their adoption and integration into everyday clinical workflows. On Feb 1, 2025, we searched PubMed for English-language studies published between Jan 1, 2015, and Jan 1, 2025, using the terms “heart failure”, “risk prediction”, “mortality”, “prognostic model”, and “machine learning”. Our search did not identify any parsimonious risk models that achieved high discrimination and were specifically validated on a large, representative UK cohort using only readily available clinical data.

Added value of this study

This study introduces and validates SIMPLE-HF, a parsimonious, 11-variable risk score for all-cause mortality in HF patients that directly addresses the performance-practicality gap. By using a novel “distillation” methodology, we translated the predictive insights from a complex, high-performing Transformer AI model into a simple, interpretable model. Our model achieves significantly higher discrimination (C-index: 0.801) for all-cause mortality compared to an EHR-adapted benchmark model (MAGGIC-EHR C-index: 0.735). Crucially, SIMPLE-HF relies exclusively on variables that are routinely collected at the point-of-care, eliminating the need for specialised tests. Furthermore, the model demonstrates superior clinical impact; for every 1000 patients screened at a 60% risk threshold, SIMPLE-HF correctly identifies 199 patients who will have an event, compared to only 99 identified by the benchmark model

Implications of all the available evidence

The combined evidence suggests that SIMPLE-HF is a robust and practical tool that can be immediately implemented in diverse clinical settings, including primary care, to improve the accuracy of heart failure risk stratification. For clinical practice, it offers a more accessible and precise method to guide therapeutic decisions and resource allocation without requiring additional specialised tests. For future research, our distillation methodology provides a new paradigm for developing clinically implementable prediction tools. This approach harnesses the power of complex AI to create simple, transparent, and effective models, a strategy that could be applied to improve risk prediction across numerous other medical conditions. External validation of the SIMPLE-HF model in diverse international cohorts is a recommended next step to confirm its generalisability.

Article activity feed