Leveraging Survival Analysis and Machine Learning for Accurate Prediction of Breast Cancer Recurrence and Metastasis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Breast cancer, with its high incidence and mortality globally, necessitates early prediction of local and distant recurrence toimprove treatment outcomes. This study develops and validates predictive models for breast cancer recurrence and metastasisusing Recurrence-Free Survival Analysis (RFS) and machine learning techniques. We merged datasets from the MolecularTaxonomy of Breast Cancer International Consortium (METABRIC), Memorial Sloan Kettering Cancer Center (MSK), DukeUniversity, and the SEER program, creating a comprehensive dataset of 190,789 rows and 23 columns. Our methodologyutilized three predictive strategies: assessing recurrence risk, differentiating local from distant recurrences, and identifyingpotential metastatic sites. Key prognostic factors were identified through survival analysis. LightGBM, XGBoost, and RandomForest models were employed and validated against data from the Baheya Foundation. The models demonstrated strongperformance; the survival analysis achieved a C-index of 0.837. The LightGBM model reached an AUC of 92% in predictingrecurrences, while XGBoost and Random Forest models distinguished recurrence types with up to 86% accuracy and predictedspecific metastatic sites effectively. This study highlights the significant potential of machine learning in advancing breastcancer management and sets a new benchmark for predictive analytics. Future research will integrate genetic data to furtherenhance these models.