Development and Internal Validation of an Explainable Machine-Learning Model to Predict 3-Year overall survival rate After Radical Cystectomy
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: This study aimed to develop and internally validate an explainable machine-learning model using routinely available clinicopathologic and laboratory variables for predicting 3-year overall survival (OS) after radical cystectomy. Methods: We retrospectively included 300 patients who underwent radical cystectomy between January 2018 and December 2022. Predictors were selected in the training set using LASSO logistic regression followed by random-forest recursive feature elimination. Ten variables were retained. Seven algorithms (logistic regression, KNN, SVM-RBF, random forest, XGBoost, LightGBM, and CatBoost) were trained on a 70% training set and evaluated on a 30% internal validation set. Discrimination, calibration, and clinical utility were assessed, and the final model was interpreted using Shapley additive explanations (SHAP). Results: In internal validation, AUCs ranged from 0.834 to 0.950. CatBoost achieved the best overall classification performance (AUC = 0.931, accuracy = 0.862, sensitivity = 0.647, specificity = 0.951, PPV = 0.846, and NPV = 0.867). SHAP analyses identified tumor stage (T, N, and M stage) as the dominant drivers of predicted risk, with additional contributions from age, BMI, albumin, globulin, lymphocyte count, platelet count, and preoperative creatinine. Conclusions: We developed an internally validated, SHAP-interpretable CatBoost model for predicting 3-year overall survival (OS) after radical cystectomy. External validation and recalibration in independent cohorts are required before clinical use.