Assessing the Role of Large Language Learning Models and Artificial Intelligence in Improving Oncology Survivorship Care: A Comparative Study of ChatGPT and Gemini

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Purpose Artificial intelligence (AI)-powered large language models (LLMs) are being increasingly utilized in healthcare, yet their ability to deliver accurate, guideline-consistent information in oncology survivorship care remains insufficiently evaluated. This study assessed the performance of ChatGPT-4, ChatGPT Strawberry, and Gemini in providing evidence-based responses related to survivorship domains including nutrition, exercise, mental health, and long-term effects of cancer treatment. Methods A set of predefined questions on survivorship care was submitted to each LLM. Two independent evaluators assessed responses using five quality metrics: clarity, coherence, completeness, factual accuracy, and relevance. Inter-rater reliability was calculated using Cohen’s kappa. Differences in AI performance were examined using descriptive statistics, paired t-tests, and mixed-effects models to assess variations by model, topic domain, and question complexity. Results Inter-rater agreement was high (κ = 0.83), with the highest agreement in coherence (κ = 0.98) and the lowest in factual accuracy (κ = 0.70). ChatGPT Strawberry consistently outperformed both ChatGPT-4 and Gemini across most domains, especially in exercise, nutrition, mental health, and hormone-related symptoms (p < 0.001). ChatGPT-4 performed comparably in fertility and sexual health but lagged in exercise and nutrition. Gemini demonstrated the lowest scores across all metrics. Notably, higher-complexity questions yielded stronger factual accuracy and completeness scores compared to medium-complexity items (p < 0.001). Conclusions LLMs show potential as tools for survivorship care education, though accuracy and completeness remain limitations. ChatGPT Strawberry demonstrated the most consistent and high-quality performance. Implications for Cancer Survivors AI models may supplement survivorship care by offering timely, guideline-consistent information. However, given current limitations, these tools should complement clinician guidance, with ongoing validation to ensure their safe integration into cancer care.

Article activity feed