Comparative Survival Analysis of ChatGPT o1, ChatGPT-4o, and Tumor Board Treatment Recommendations in Head and Neck Squamous Cell Carcinoma - A study of 1757 cases
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
(1) Objectives: Artificial intelligence (AI) has demonstrated potential in supporting decision making processes in oncology by assisting the multidisciplinary tumor board (MDT). This study evaluates the performance of two AI-based decision-support tools, the most recently introduced ChatGPT o1 and the established ChatGPT 4o, compared to MDT treatment recommendations in head and neck squamous cell carcinoma (HNSCC). Survival outcomes and treatment patterns were analyzed in a large publicly accessible patient cohort; (2) Materials and Methods: 1843 HNSCC cases of the University of Michigan SPORE clinical outcomes analytic dataset were analyzed, and 1,757 patients were included in the final study. Treatment recommendations by ChatGPT o1, ChatGPT 4o, and the MDT were compared using clinical data from the cohort. Survival analysis was performed using Kaplan-Meier curves and Cox proportional hazards models, adjusting for confounders such as tumor stage, smoking status, body mass index (BMI), HPV status, and p16 expression. Subgroup analyses by tumor stage and location were conducted, and treatment modalities were compared across the three groups. For validation purposes, the TCGA HNSCC dataset of 528 cases was analyzed; (3) Results: ChatGPT o1 treatment recommendations yielded significantly better overall survival compared to both ChatGPT-4o and the MDT in unadjusted Kaplan-Meier analysis (log-rank test p < 0.0001). However, the survival advantage was based on a tumor stage distribution bias as ChatGPT o1 favored early-stage patients for its recommendations. After adjustment for covariates, no significant differences were observed in survival between ChatGPT o1, ChatGPT-4o, and the MDT. Tumor stage remained the strongest predictor of survival (HR: 1.34 per stage increment, 95% CI: 1.24–1.44, p < 0.0001). Similar results were also demonstrated in the independent validation TCGA cohort; (4) Discussion : While ChatGPT o1’s treatment recommendations yielded better survival outcomes in this study; these results were driven mostly by stage-based selection bias. After adjustment, no statistically significant survival advantage was observed among the three treatment groups. Differences in treatment modality preferences and patient selection emphasize potential AI-driven opportunities and potential "outside-the-box" reasoning for personalizing treatment strategies and highlight the importance of stage-stratified analyses when comparing AI-based decision-support tools to traditional tumor boards. This study lays the foundation for the use of ChatGPT o1 in simulating clinical decision-making.