A Unified Framework for Survival Prediction: Combining Machine Learning Feature Selection with Traditional Survival Analysis in Heart Failure and METABRIC Breast Cancer

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: The clinical adoption of machine learning (ML) in survival analysis is often hindered by the "black box" nature of complex algorithms. This study presents a unified and clinically grounded framework that integrates ML-based feature selection with traditional survival analysis to bridge the gap between algorithmic predictive power and routine clinical interpretation. Methods: We employed a hybrid approach where ML models identified high-impact features, which were subsequently validated using Cox Proportional Hazards and Kaplan–Meier analysis. The framework was evaluated across two distinct disease domains, Heart Failure and the METABRIC breast cancer cohort, to assess robustness and generalizability. Results: In the Heart Failure dataset, risk stratification based on age, serum creatinine, and blood pressure successfully separate patients into distinct risk groups. The high-risk group exhibited significantly increased mortality compared to the low-risk group (Hazard Ratio [HR]: 2.61; 95% CI: 1.42–4.78; p = 0.0013). Validation in the METABRIC dataset confirmed the adaptability of the method; a composite risk profile utilizing age at diagnosis, HER2 status, and the Nottingham Prognostic Index (NPI) yielded strong separation (p < 0.001). The high-risk breast cancer group demonstrated an HR of 2.73 (95% CI: 2.34–3.19), with a median survival of 104.7 months compared to 252.3 months for the low-risk group—a survival difference of approximately 12.3 years. Conclusions: This cross-disease validation demonstrates that integrating ML feature selection with established survival metrics translates complex models into concise, statistically robust, and clinically interpretable prognostic factors, offering a scalable methodology for diverse clinical contexts.

Article activity feed