ChatGPT vs. Gemini: Which Provides Better Information on Bladder Cancer?

Ahmed Alasker
Nada Alshathri
Seham Alsalamah
Nura Almansour
Faris Alsalamah
Mohammad Alghafees
Mohammad AlKhamees
Bader Alsaikhan

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background/Objectives: Bladder cancer, the most common and heterogeneous malignancy of the urinary tract, presents with diverse types and treatment options, making comprehensive patient education essential. As large language models (LLMs) emerge as a promising resource for disseminating medical information, their accuracy and validity compared to traditional methods remain under-explored. This study aims to evaluate the effectiveness of LLMs in educating the public about bladder cancer. Methods: Frequently asked questions regarding bladder cancer were sourced from reputable educational materials and assessed for accuracy, comprehensiveness, readability, and consistency by two independent board-certified urologists, with a third resolving any discrepancies. The study utilized a 3-point Likert scale for accuracy, a 5-point Likert scale for comprehensiveness, and the Flesch–Kincaid (FK) Grade Level and Flesch Reading Ease (FRE) scores to gauge readability. Results: ChatGPT-3.5, ChatGPT-4, and Gemini were evaluated on 12 general questions, 6 questions related to diagnosis, 28 concerning treatment, and 7 focused on prevention. Across all categories, the correct response rate was notably high, with ChatGPT-3.5 and ChatGPT-4 achieving 92.5%, compared to 86.3% for Gemini, with no significant difference in accuracy. However, there was a significant difference in comprehensiveness (p = 0.011) across the models. Overall, a significant difference in performance was observed among the LLMs (p < 0.001), with ChatGPT-4 providing the most college-level responses, though these were the most challenging to read. Conclusions: In conclusion, our study adds value to the applications of Artificial Intelligence (AI) in bladder cancer education, with notable insights into the accuracy, comprehensiveness, and stability of the three LLMs.

Version published to 10.3390/siuj6020034
Apr 21, 2025
Version published to 10.20944/preprints202412.1528.v1
Dec 18, 2024

Comparative Evaluation the Knowledge of Large Language Models about Response Evaluation Criteria in Solid Tumors?

This article has 3 authors:
1. Eren Çamur
2. Turay Cesur
3. Yasin Celal Güneş
This article has no evaluationsLatest version May 7, 2025
Assessing the Quality of Japanese Online Breast Cancer Treatment Information Using Large Language Models: A Comparison of ChatGPT, Claude, and Expert Evaluations

This article has 9 authors:
1. Atsushi Fushimi
2. Mitsuo Terada
3. Rie Tahara
4. Yuko Nakazawa
5. Madoka Iwase
6. Tomoko Shibayama
7. Samy Kotti
8. Nami Yamashita
9. Asumi Iesato
This article has no evaluationsLatest version Apr 17, 2025
Comparative accuracy of ChatGPT-o1, DeepSeek R1, and Gemini 2.0 in answering general primary care questions

This article has 5 authors:
1. Guerino Recinella
2. Chiara Altini
3. Marco Cupardo
4. Iacopo Cricelli
5. Lorenzo Maestri
This article has no evaluationsLatest version Apr 19, 2025

Listed in

Abstract

Article activity feed

Related articles

Comparative Evaluation the Knowledge of Large Language Models about Response Evaluation Criteria in Solid Tumors?

Assessing the Quality of Japanese Online Breast Cancer Treatment Information Using Large Language Models: A Comparison of ChatGPT, Claude, and Expert Evaluations

Comparative accuracy of ChatGPT-o1, DeepSeek R1, and Gemini 2.0 in answering general primary care questions