Benchmarking Machine Learning against Mixture Cure Models for Personalized Recurrence and Survival Prediction in Ovarian Cancer: A 24-Year Retrospective Cohort Study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Accurate survival prediction in ovarian cancer has been challenged due to biological heterogeneity and the cure phenomenon, where a subgroup of patients achieves long-term remission. This study was designed to compare the predictive capability of the machine learning algorithm Random Survival Forest with the statistical Mixture Cure Model over a 24-year timeframe, addressing a significant gap in direct comparative evaluations. Methods: Data from 352 ovarian cancer patients (2000-2024) with the primary outcomes of Overall Survival (OS) and Progression-Free Survival (PFS) were analyzed. We implemented two modeling approaches: a semi-parametric Mixture Cure Model (with logistic cure and Cox latency components) and a non-parametric Random Survival Forest ensemble of 1000 survival trees. Model performance was evaluated using the discrimination index (C-index), integrated Brier score, and time-dependent AUC via cross-validation. Variable importance and non-linear effects were explored through partial dependence plots. Results: The mean age at diagnosis was 45.97 ± 15.22 years, with 59.7% presenting at advanced stages (FIGO III/IV). During 273 months of follow-up, 115 recurrences (33%) and 55 deaths (16%) occurred. Variable importance analysis revealed that response to neoadjuvant therapy and surgical quality were the most powerful predictors of recurrence, while age at diagnosis and baseline CA125 level guided overall survival. Partial dependence plots revealed a non-linear threshold effect for the CA125 marker. The Random Survival Forest model significantly outperformed the Mixture Cure Model in predicting recurrence (C-index = 0.84, 95% CI: 0.80–0.88 vs. 0.78, 95% CI: 0.74–0.82; p=0.02) and showed better discrimination for death (C-index = 0.78, 95% CI: 0.74–0.82 vs. 0.75, 95% CI: 0.71–0.79; p=0.15). The Mixture Cure Model confirmed the existence of a survival plateau after 150 months, indicating a significant cure fraction (~85%) in this cohort for OS, while no cure fraction was identified for PFS. Conclusion Our findings demonstrate that machine learning, due to its ability to uncover non-linear interactions, is a superior tool for precision oncology. Distinguishing recurrence drivers (treatment-focused) from mortality drivers (host-focused) provides a new roadmap for personalized patient management and optimization of long-term follow-up protocols.