A Unified Framework for Survival Prediction: Combining Machine Learning Feature Selection with Traditional Survival Analysis in Heart Failure and METABRIC Breast Cancer

Fangya Tan
Jian-Guo Zhou
Shuqiao Li
Bowen Long
Srikar Bellur
Yang Zhou
Mark Newman

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: The clinical use of machine learning (ML) in survival analysis is often limited by the “black box” nature of complex algorithms, which makes their results difficult to interpret in practice. In this study, we propose a unified and clinically grounded framework that integrates ML-based feature selection with traditional survival analysis. This approach aims to bridge the gap between strong predictive performance and clear, clinically meaningful interpretation. Methods: High-impact prognostic clinical features were identified using ML models GBM-Cox, RSF, and LASSO-Cox with 5-fold stratified cross-validation and subsequently validated using Cox Proportional Hazards and Kaplan–Meier analysis. The framework was evaluated across two distinct disease domains, Heart Failure and the METABRIC breast cancer cohort, to assess robustness and generalizability. Results: In the Heart Failure dataset, age group, serum creatinine, and blood pressure stratified patients into distinct risk groups. The high-risk group had significantly higher mortality (HR: 2.61; 95% CI: 1.42–4.78; p = 0.0013). In the METABRIC cohort, age at diagnosis, HER2 status, and Nottingham Prognostic Index (NPI) showed strong survival separation (p < 0.001). The high-risk group had an HR of 2.73 (95% CI: 2.34–3.19) and the faced a significantly shorter median survival (104.7 vs. 252.3 months), representing a 12.3-year reduction in life expectancy compared to low-risk group. This prognostic separation emphasizes the predictive power of selected baseline variables. Performance remained stable across cohorts, with C-index values (0.665–0.794) consistent with standard clinical benchmarks. Conclusions: Integrating cross-validated machine learning feature selection with Cox-based survival analysis enables stable and clinically interpretable risk stratification across diseases. By translating ML selected predictors into hazard ratios and absolute survival differences, this framework provides a reproducible and clinically grounded approach for survival risk assessment.

Version published to 10.3390/diagnostics16050790
Mar 6, 2026
Version published to 10.20944/preprints202601.2325.v1
Jan 29, 2026

Benchmarking Ensemble Machine Learning Algorithms for the Early Prediction of Stroke in Imbalanced Clinical Cohorts: A Comparative Analysis and Decision Curve Assessment

This article has 2 authors:
1. Ibrahim Ibrahim Shuaibu
2. Yousaf Hussain
This article has no evaluationsLatest version Jan 22, 2026
Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease

This article has 16 authors:
1. Hao Liu
2. Meijun Liu
3. Xinmiao Guan
4. Feng Cao
5. Changhao Liang
6. Zhongwen Qi
7. Jiaqi Hui
8. Junnan Zhao
9. Jingli Xing
10. Jianguo Zhou
11. Dong Zhang
12. Lei Liu
13. Xiaoliang Hao
14. Minjing Luo
15. Fengqin Xu
16. Yutong Fei
This article has no evaluationsLatest version Jan 12, 2026
A Multicenter, Interpretable Machine Learning-Based Survival Model for Predicting 28-Day Mortality Risk in Sepsis Patients with Heart Failure: A Retrospective Cohort Study and Performance Comparison with the SOFA Score

This article has 9 authors:
1. Yucan Zhou
2. Jian Wang
3. Yunchong Li
4. Jiahao Zou
5. Zhaoxin Huang
6. Yue Li
7. Mengyuan Hou
8. Zhibin Ma
9. Chunlong Liu
This article has no evaluationsLatest version Mar 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Benchmarking Ensemble Machine Learning Algorithms for the Early Prediction of Stroke in Imbalanced Clinical Cohorts: A Comparative Analysis and Decision Curve Assessment

Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease

A Multicenter, Interpretable Machine Learning-Based Survival Model for Predicting 28-Day Mortality Risk in Sepsis Patients with Heart Failure: A Retrospective Cohort Study and Performance Comparison with the SOFA Score