Predicting Optimal Colorectal Cancer Treatments Across Age Groups Using Machine Learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Colorectal cancer (CRC) is the third most common type of cancer in oncological pathology. Currently, it is the most common cancer in the digestive tract, accounting for 13% of all malignant tumors. The disease is recognized as the second leading cause of cancer death, affecting people equally worldwide, both in developed and developing countries. CRC is a leading cause of cancer-related mortality worldwide, with treatment outcomes varying significantly across different age groups. This study employed multiple machine learning (ML) techniques to predict the most effective treatment methods for CRC patients based on age-specific hazard ratios (HRs). Using data from the SEER database, we analyzed 72,341 CRC patients treated with Total Mesorectal Excision (TME), chemotherapy (CT), radiotherapy (RT), or neoadjuvant radiotherapy (nRT). Model validation included 10-fold stratified cross-validation with class balancing via the Synthetic Minority Over-sampling Technique (SMOTE). The study identified treatment recommendations (non-RT/nRT/RT) that were stratified by age and CT status. These findings highlight the potential of ML in personalizing CRC treatment strategies, thereby improving patient outcomes and reducing risks. The ML framework enables age-stratified CRC treatment optimization through interpretable SHAP analysis, identifies T-stage (HR=1.41, p<0.001) and marital status as key predictors, and reduces misclassification errors compared to National Comprehensive Cancer Network (NCCN) guidelines. Unlike previous studies focusing on general CRC treatment prediction, our work uniquely integrates age-stratified hazard ratio modeling with SHAP-based interpretability, enabling clinically actionable recommendations tailored to three age cohorts (≤50, 51–65, >65 years). This approach reduces misclassification errors by 15% compared to NCCN guidelines (p=0.01), demonstrating the value of ML for personalized oncology.