Clinicodemographic Prediction of Overall Survival in Patients with Head and Neck Merkel Cell Carcinoma: A Machine Learning Approach
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Merkel cell carcinoma (MCC) is a rare cutaneous neuroendocrine malignancy with a higher case-fatality rate than melanoma. The prognosis of MCC is complex and depends on many factors. In this study, we aimed to develop predictive survival models using machine learning (ML) algorithms and statistical techniques for patients with head and neck MCC.
Methodology
Using a cohort of 1,372 patients diagnosed with MCC of the head and neck region in the United States between 2000 to 2019 sourced from the Surveillance, Epidemiology, and End Results (SEER) Program, we developed and evaluated a cox-proportional hazards (CPH) regression model, nine classification and regression ML models, and two ML-based survival models. Models were built with a total of 20 features, including demographic, cancer-related and treatment/surgery-related variables. Pre-processing, hyperparameter tuning, classification, regression, survival analyses, and model evaluations were performed using ‘scikit-learn’, ‘scikit-survival’, and ‘lifelines’ packages on Python 3.80.
Results
The mean age of patients was (66.0). Most cases were diagnosed in stage I (n=723, 52.7%). Multivariate CPH model yielded a Concordance-Index (C-Index) = 0.700 on the test set, outperforming both random forest survival (C-Index = 0.591) and survival tree (C-Index = 0.582) algorithms. Of the nine classification models, gradient boosting classifier predicted the most accurate 2-year (AUC = 0.75; accuracy= 0.71) and 5-year (AUC = 0.75; accuracy = 0.68) survival. Additionally, the ridge- and lasso-regularized linear models performed the most accurate regression (RMSE = 1182.84, R2 = 0.2259; RMSE = 1184.70, R2 = 0.2234, respectively), and the gradient boosting regressor had acceptable predictions (RMSE = 1189.64, R2 = 0.2170) on test sets. According to the Shapley Additive Explanations (SHAP) value analysis, the most critical feature of these regression models was age, followed by sex and AJCC stage.
Conclusions
This study found that machine learning and statistical models provide reliable survival predictions for head and neck Merkel cell carcinoma, with models like gradient boosting classifiers having acceptable outputs, especially for 2-year survival.