Assessing the Role of Large Language Learning Models and Artificial Intelligence in Improving Oncology Survivorship Care: A Comparative Study of ChatGPT and Gemini
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Purpose Artificial intelligence (AI)-powered large language models (LLMs) are being increasingly utilized in healthcare, yet their ability to deliver accurate, guideline-consistent information in oncology survivorship care remains insufficiently evaluated. This study assessed the performance of ChatGPT-4, ChatGPT Strawberry, and Gemini in providing evidence-based responses related to survivorship domains including nutrition, exercise, mental health, and long-term effects of cancer treatment. Methods A set of predefined questions on survivorship care was submitted to each LLM. Two independent evaluators assessed responses using five quality metrics: clarity, coherence, completeness, factual accuracy, and relevance. Inter-rater reliability was calculated using Cohen’s kappa. Differences in AI performance were examined using descriptive statistics, paired t-tests, and mixed-effects models to assess variations by model, topic domain, and question complexity. Results Inter-rater agreement was high (κ = 0.83), with the highest agreement in coherence (κ = 0.98) and the lowest in factual accuracy (κ = 0.70). ChatGPT Strawberry consistently outperformed both ChatGPT-4 and Gemini across most domains, especially in exercise, nutrition, mental health, and hormone-related symptoms (p < 0.001). ChatGPT-4 performed comparably in fertility and sexual health but lagged in exercise and nutrition. Gemini demonstrated the lowest scores across all metrics. Notably, higher-complexity questions yielded stronger factual accuracy and completeness scores compared to medium-complexity items (p < 0.001). Conclusions LLMs show potential as tools for survivorship care education, though accuracy and completeness remain limitations. ChatGPT Strawberry demonstrated the most consistent and high-quality performance. Implications for Cancer Survivors AI models may supplement survivorship care by offering timely, guideline-consistent information. However, given current limitations, these tools should complement clinician guidance, with ongoing validation to ensure their safe integration into cancer care.