Long-term Heart Risk Prediction by Survival Analysis in Echocardiography: Leveraging Machine Learning, Interpretability Techniques, and Advanced Statistical Modelling

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background/Objectives: Survival analysis is critical for predicting time-to-event outcomes in cardiovascular care, such as patient survival following heart failure. This study leverages the UCI Echocardiogram dataset to enhance survival analysis by integrating Random Survival Forests (RSF) with Survshap (t) (Shapley Additive explanations for survival models), fractional polynomial modelling, and Bayesian methods. We addressed the limitations of traditional Cox models by capturing non-linear relationships, time-varying effects, and causal interactions. Methods: The dataset is a large population of 100000 samples, 17 variables, including 132 samples with variables such as age, wall motion index (WMI), and fractional shortening (FS), and was preprocessed to address missing values and outliers. RSF was applied to model complex interactions, achieving robust predictions of survival outcomes. Survshap (t) provided interpretability, identifying age and WMI as the most influential predictors. Fractional polynomial modelling captured non-linear relationships, enhancing the model’s adaptability—Bayesian survival analysis quantified uncertainty, and causal inference (propensity score matching) evaluated treatment effects. DeepHitSingle and validation metrics (Brier score and C-index) were used to assess robust performance. Results: The integrated approach demonstrated high predictive accuracy, achieving a Brier score of 0.141. Kaplan-Meier analysis indicated a survival probability of 75% at 10 months and approximately 60% at 40 months. The concordance index was 0.86. Random Survival Forest identified age (VIMP=10) and wall motion index (VIMP=20) as the top predictors, with SHAP analysis confirming their dynamic contributions, whereas Pericardial Effusion (PE) exhibited negligible predictive influence. Fractional polynomials effectively captured non-linear effects, such as age0.5 (HR = 1.03). Bayesian posterior estimates demonstrated reliability, with a baseline hazard of 2.036 (95% Highest Density Interval [1.83, 2.24]). Additionally, causal analysis revealed that smoking status had a minimal effect (ATE = 7.47 × 10−5). Conclusion: Combining RSF, interpretability techniques (Survshap (t), SHAP, LIME), and advanced statistical modelling (fractional polynomials, Bayesian methods) significantly improves survival analysis. The framework provides personalised risk stratification, validated through synthetic data and clinical decision-making, enabling early optimised intervention for high-risk groups and offering a transformative tool for echocardiography-based cardiovascular care for heart failure patients.

Article activity feed