Comparative Survival Analysis of ChatGPT o1, ChatGPT-4o, and Tumor Board Treatment Recommendations in Head and Neck Squamous Cell Carcinoma - A study of 1757 cases

Benedikt Schmidl
Cosima Hoch
Maria Shoykhet
Tobias Weiser
Steffi Pigorsch
Fabian Stögbauer
Barbara Wollenberg
Timon Hussain
Markus Wirth

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

(1) Objectives: Artificial intelligence (AI) has demonstrated potential in supporting decision making processes in oncology by assisting the multidisciplinary tumor board (MDT). This study evaluates the performance of two AI-based decision-support tools, the most recently introduced ChatGPT o1 and the established ChatGPT 4o, compared to MDT treatment recommendations in head and neck squamous cell carcinoma (HNSCC). Survival outcomes and treatment patterns were analyzed in a large publicly accessible patient cohort; (2) Materials and Methods: 1843 HNSCC cases of the University of Michigan SPORE clinical outcomes analytic dataset were analyzed, and 1,757 patients were included in the final study. Treatment recommendations by ChatGPT o1, ChatGPT 4o, and the MDT were compared using clinical data from the cohort. Survival analysis was performed using Kaplan-Meier curves and Cox proportional hazards models, adjusting for confounders such as tumor stage, smoking status, body mass index (BMI), HPV status, and p16 expression. Subgroup analyses by tumor stage and location were conducted, and treatment modalities were compared across the three groups. For validation purposes, the TCGA HNSCC dataset of 528 cases was analyzed; (3) Results: ChatGPT o1 treatment recommendations yielded significantly better overall survival compared to both ChatGPT-4o and the MDT in unadjusted Kaplan-Meier analysis (log-rank test p < 0.0001). However, the survival advantage was based on a tumor stage distribution bias as ChatGPT o1 favored early-stage patients for its recommendations. After adjustment for covariates, no significant differences were observed in survival between ChatGPT o1, ChatGPT-4o, and the MDT. Tumor stage remained the strongest predictor of survival (HR: 1.34 per stage increment, 95% CI: 1.24–1.44, p < 0.0001). Similar results were also demonstrated in the independent validation TCGA cohort; (4) Discussion : While ChatGPT o1’s treatment recommendations yielded better survival outcomes in this study; these results were driven mostly by stage-based selection bias. After adjustment, no statistically significant survival advantage was observed among the three treatment groups. Differences in treatment modality preferences and patient selection emphasize potential AI-driven opportunities and potential "outside-the-box" reasoning for personalizing treatment strategies and highlight the importance of stage-stratified analyses when comparing AI-based decision-support tools to traditional tumor boards. This study lays the foundation for the use of ChatGPT o1 in simulating clinical decision-making.

Version published to 10.21203/rs.3.rs-5648998/v1 on Research Square
Jan 16, 2025

Assessing ChatGPT-4 as a clinical decision support tool in neuro-oncology radiotherapy: a prospective comparative study

This article has 7 authors:
1. Paolo Tini
2. Federica Novi
3. Flavio Donnini
4. Armando Perrella
5. Giulio Bagnacci
6. Maria Antonietta Mazzei
7. Giuseppe Minniti
This article has no evaluationsLatest version Oct 15, 2025
Feeding Intelligence: Comparative Evaluation of ChatGPT and Clinical Guidelines for Nutritional Management in Head and Neck Cancer

This article has 9 authors:
1. Shasha Shen
2. Kai Zhou
3. Mingna Wu
4. Dahai Liu
5. Xiaotong Shen
6. Peijie Li
7. Ying Xu
8. Sijia Zheng
9. Xiaoxia Gou
This article has no evaluationsLatest version Sep 5, 2025
The Performance of ChatGPT-4o and DeepSeek-R1 in Interpreting Thyroid Nodule Ultrasound Text Report: A Multicenter Study

This article has 6 authors:
1. Yujie Xie
2. Bing Zhan
3. Kangfan Zhang
4. Yuchen Li
5. Jiarui Liu
6. Chunping Ning
This article has no evaluationsLatest version Oct 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Assessing ChatGPT-4 as a clinical decision support tool in neuro-oncology radiotherapy: a prospective comparative study

Feeding Intelligence: Comparative Evaluation of ChatGPT and Clinical Guidelines for Nutritional Management in Head and Neck Cancer

The Performance of ChatGPT-4o and DeepSeek-R1 in Interpreting Thyroid Nodule Ultrasound Text Report: A Multicenter Study