Assessing the Role of Large Language Learning Models and Artificial Intelligence in Improving Oncology Survivorship Care: A Comparative Study of ChatGPT and Gemini

Hundal Jasmin
Asfand Yar Cheema
Munir Mishaal
Amna Zaheer
Xuefei Jai
Maiti Baidehi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose Artificial intelligence (AI)-powered large language models (LLMs) are being increasingly utilized in healthcare, yet their ability to deliver accurate, guideline-consistent information in oncology survivorship care remains insufficiently evaluated. This study assessed the performance of ChatGPT-4, ChatGPT Strawberry, and Gemini in providing evidence-based responses related to survivorship domains including nutrition, exercise, mental health, and long-term effects of cancer treatment. Methods A set of predefined questions on survivorship care was submitted to each LLM. Two independent evaluators assessed responses using five quality metrics: clarity, coherence, completeness, factual accuracy, and relevance. Inter-rater reliability was calculated using Cohen’s kappa. Differences in AI performance were examined using descriptive statistics, paired t-tests, and mixed-effects models to assess variations by model, topic domain, and question complexity. Results Inter-rater agreement was high (κ = 0.83), with the highest agreement in coherence (κ = 0.98) and the lowest in factual accuracy (κ = 0.70). ChatGPT Strawberry consistently outperformed both ChatGPT-4 and Gemini across most domains, especially in exercise, nutrition, mental health, and hormone-related symptoms (p < 0.001). ChatGPT-4 performed comparably in fertility and sexual health but lagged in exercise and nutrition. Gemini demonstrated the lowest scores across all metrics. Notably, higher-complexity questions yielded stronger factual accuracy and completeness scores compared to medium-complexity items (p < 0.001). Conclusions LLMs show potential as tools for survivorship care education, though accuracy and completeness remain limitations. ChatGPT Strawberry demonstrated the most consistent and high-quality performance. Implications for Cancer Survivors AI models may supplement survivorship care by offering timely, guideline-consistent information. However, given current limitations, these tools should complement clinician guidance, with ongoing validation to ensure their safe integration into cancer care.

Version published to 10.21203/rs.3.rs-6584974/v1 on Research Square
May 15, 2025

Hybrid Intelligence in Oncology: Superior Accuracy and Convergence of Large Language Models Over Human Experts in Interpreting

This article has 7 authors:
1. Berna Akkus Yildirim
2. Baver Tutun
3. Gorkem Durak
4. Emre Batuhan Yildirim
5. Emre Uysal
6. Sukru Mehmet Erturk
7. Ulas Bagci
This article has no evaluationsLatest version Jun 4, 2025
Large Language Models for the assessment of medical students’ clinical decision-making

This article has 5 authors:
1. Sina Chole Benker
2. Jonathan Vollprecht
3. Cihan Papan
4. Max Hao Lu
5. Dogus Darici
This article has no evaluationsLatest version Jun 17, 2025
High Concordance Between GPT-4o and Multidisciplinary Tumor Board Decisions in Breast Cancer: A Retrospective Decision Support Analysis

This article has 10 authors:
1. Emre Buyukceran
2. Ayça Seyfettin
3. Andelib Babatürk
4. Murat Bulut Özkan
5. Dilşen Çolak
6. İlhami Ünal
7. Esin Kaymaz
8. Esin Ergün
9. Mustafa Özdeş Emer
10. Hüsnü Hakan Mersin
This article has no evaluationsLatest version Jun 13, 2025

Listed in

Abstract

Article activity feed

Related articles

Hybrid Intelligence in Oncology: Superior Accuracy and Convergence of Large Language Models Over Human Experts in Interpreting

Large Language Models for the assessment of medical students’ clinical decision-making

High Concordance Between GPT-4o and Multidisciplinary Tumor Board Decisions in Breast Cancer: A Retrospective Decision Support Analysis