Prediction of distant metastasis in renal cell carcinoma using machine learning algorithms on the basis of the SEER database and a Chinese population

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objectives: Few machine learning (ML) studies have investigatedthe prediction of distant metastasis in patients with renal cell carcinoma (RCC). This study aimed to develop and validate predictive models based on ML algorithms for RCC patients with distant metastasis. Methods: We extracted RCC data from theSEER database between 2004 and 2015 (n=192,912) and the Chinese National Cancer Center (CNCC) database between 2010 and 2020 (n=3034). Seven different algorithms were applied to predict distant metastasis in RCC. Fivefold cross-validation was employed for model construction. The data were analyzed using Python on the basis of incomplete data, complete data, upsampling data and downsampling data. Results: After data cleaning and screening, 121,741 cases from the SEER dataset and 2803 cases from the CNCC external test set were retained. For the incomplete data, the neutral network model [area under the curve (AUC) 95% confidence interval (CI) of the external data: 0.7467±0.0573] achieved the highest accuracy. For the complete data, the support vector machine (SVM) model achieved the highest accuracy, with an AUC 95% CI of 0.8221±0.0485. The disparity between positive and negative samples varied significantly across different datasets. Upsampling and downsampling analyses were also conducted. For the upsampling data, the extreme gradient boosting (XGBoost) model had the highest accuracy, with an AUC 95% CI for the external data of 0.8162±0.0558. For the downsampling data, the SVM model achieved the highest accuracy,with an AUC 95% CI of 0.8274±0.0546 for the external data. Conclusions: Our study shows that ML algorithms can effectively predict distant metastasis in patients with RCC. ML models have favorable application prospects in clinical practice.

Article activity feed