A Unified Framework for Survival Prediction: Combining Machine Learning Feature Selection with Traditional Survival Analysis in Heart Failure and METABRIC Breast Cancer
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: The clinical use of machine learning (ML) in survival analysis is often limited by the “black box” nature of complex algorithms, which makes their results difficult to interpret in practice. In this study, we propose a unified and clinically grounded framework that integrates ML-based feature selection with traditional survival analysis. This approach aims to bridge the gap between strong predictive performance and clear, clinically meaningful interpretation. Methods: High-impact prognostic clinical features were identified using ML models GBM-Cox, RSF, and LASSO-Cox with 5-fold stratified cross-validation and subsequently validated using Cox Proportional Hazards and Kaplan–Meier analysis. The framework was evaluated across two distinct disease domains, Heart Failure and the METABRIC breast cancer cohort, to assess robustness and generalizability. Results: In the Heart Failure dataset, age group, serum creatinine, and blood pressure stratified patients into distinct risk groups. The high-risk group had significantly higher mortality (HR: 2.61; 95% CI: 1.42–4.78; p = 0.0013). In the METABRIC cohort, age at diagnosis, HER2 status, and Nottingham Prognostic Index (NPI) showed strong survival separation (p < 0.001). The high-risk group had an HR of 2.73 (95% CI: 2.34–3.19) and the faced a significantly shorter median survival (104.7 vs. 252.3 months), representing a 12.3-year reduction in life expectancy compared to low-risk group. This prognostic separation emphasizes the predictive power of selected baseline variables. Performance remained stable across cohorts, with C-index values (0.665–0.794) consistent with standard clinical benchmarks. Conclusions: Integrating cross-validated machine learning feature selection with Cox-based survival analysis enables stable and clinically interpretable risk stratification across diseases. By translating ML selected predictors into hazard ratios and absolute survival differences, this framework provides a reproducible and clinically grounded approach for survival risk assessment.