Explainable Machine Learning for Preoperative Relapse Prediction in Molecularly Stratified Endometrial Cancer: A Single-Center Finnish Cohort Study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Relapse risk in endometrial carcinoma (EC) is strongly influenced by molecular subtype, yet current WHO/ESGO classifications rely on postoperative data, limiting their utility for preoperative decision-making. We developed and compared interpretable machine learning (ML) models to predict relapse timing (none, ≤6 months, >6 months) using exclusively preoperative multimodal data. In a retrospective cohort of 784 EC patients, we integrated clinicopathological, molecular, immunohistochemical, and systemic biomarkers and constructed four feature strategies: (1) Traditional (clinicopathology), (2) ESGO (guideline risk groups), (3) TP53 + MMRd (high-risk biology), and (4) POLE (low-risk biology). Classifiers (Random Forest (RF), Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Gradient Boosting (GBM)) were trained with leakage-safe preprocessing and in-fold resampling; performance was evaluated via area under the curve (AUC), accuracy, recall, and F1 score, and interpretability via SHapley Additive exPlanations (SHAP). The RF-based Traditional model achieved the highest overall performance (F1 = 0.895, AUC = 0.84), while the GBM-based POLE model showed superior sensitivity (F1 = 0.886, AUC = 0.842). SHAP identified ARID1A loss, elevated CA125, thrombocytosis, and p16 expression among key predictors of relapse; while overlapping high-risk features across models included advanced stage, deeper myometrial invasion, elevated CA125, and positive cytology. These biologically coherent, explainable predictions support individualized risk stratification and may enhance preoperative decision-making, particularly for aggressive histology and high-risk molecular subtypes.