A Multicenter Machine Learning Model Incorporating Circulating Tumor Cells for Postoperative Recurrence Prediction in Localized Renal Cell Carcinoma
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Postoperative recurrence prediction in localized renal cell carcinoma (LRCC) remains clinically challenging. Circulating tumor cells (CTCs) have emerged as promising, minimally invasive biomarkers for disease monitoring and prognostication. This study aimed to develop and interpret a machine learning model to predict recurrence risk in LRCC, and to translate the model into a clinically applicable scoring system. Methods A multicenter retrospective cohort of 326 patients with LRCC, collected from 14 hospitals across 7 provinces in China, was randomly divided into training and validation sets (7:3 ratio). Baseline clinicopathological characteristics and CTC subtype counts/changes were collected. Highly correlated features were removed prior to modeling. Key predictors were selected using LASSO, random forest importance, and recursive feature elimination. Six machine learning algorithms—logistic regression, random forest, support vector machine, XGBoost, naive Bayes, and multilayer perceptron—were trained with fivefold cross-validation, and performance was evaluated by AUC. The best-performing model was interpreted using SHapley Additive exPlanations (SHAP) and translated into a simple threshold-based clinical score integrating five variables (ΔMCTC, MCTC, EpiCTC, ΔEpiCTC, and RENAL score). Patients were stratified into low- (0–2) and high-risk (3–5) groups, and Kaplan–Meier curves compared recurrence-free survival. The predictive accuracy of the new score was further compared with UISS and SSIGN using 1- and 5-year ROC analyses in the validation cohort. Results Among all models, the RF model achieved the highest predictive performance (AUC = 0.850). SHAP analysis identified changes in MCTCs and epithelial CTCs as the most critical predictors, followed by baseline MCTCs, epithelial CTCs, and RENAL score. Using RF-derived optimal thresholds, patients received 1 point for each variable exceeding its cutoff and 0 otherwise. Total scores stratified patients into low-risk (0–2 points) and high-risk (3–5 points) groups, with the high-risk group showing significantly shorter PFS compared with the low-risk group (p < 0.001). The clinic risk score outperformed conventional prognostic scores, including UISS and SSIGN, in terms of AUC in the validation cohort. Conclusions A machine learning model integrating CTC metrics and anatomical factors accurately predicted LRCC recurrence and outperformed existing prognostic systems. Its simplified, threshold-based clinical score offers a practical approach for individualized postoperative risk assessment. Trial registration: This study was approved by the Scientific Ethics Committee of the Department of Medicine of Xi’an Jiaotong University (No. 2021033) and the number of China Clinical Research Registration is ChiCTR2000035394.