Quality assessment of patient-facing urologic telesurgery content using validated tools

Tarak Davuluri
Paul Gabriel
Matthew Wainstein
Obi Ekwenna

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Introduction

With increasing accessibility to Artificial Intelligence (AI) chatbots, the precision and clarity of medical information provided require rigorous assessment. Urologic telesurgery represents a complex concept that patients will investigate using AI. We compared ChatGPT and Google Gemini in providing patient-facing information on urologic telesurgical procedures.

Methods

19 questions related to urologic telesurgery were generated using general information from the American Urologic Association (AUA) and European Robotic Urology Section (ERUS). Questions were organized into 4 categories (Prospective, Technical, Recovery, Other) and directly typed into ChatGPT 4o and Google Gemini 2.5 (non-paid versions). For each question, a new chat was started to prevent any continuation of answers. Three reviewers independently reviewed the responses using two validated healthcare tools: DISCERN (quality) and Patient Education Material Assessment Tool (understandability and actionability).

Results

Mean DISCERN scores (out of 80) were higher for Gemini than ChatGPT in all domains except “Other”. Prospective 49.2 versus 39.1; technical 52.3 versus 44.3; recovery 53.7 versus 45.4; other 54.3 versus 56.5; overall 52.4 versus 45.8 (Fig. 1). PEMAT- P understandability uniformly exceeded 70% for both platforms: prospective 80.0% versus 71.7%; technical 80.1% versus 79.8%; recovery 79.2% versus 80.1%; other 79.2% versus 81.3%; overall 79.7% versus 78.1% (Fig. 2). Actionability was uniformly low; only Gemini met the 70% threshold in the prospective domain (Fig. 3).

Conclusion

ChatGPT and Gemini deliver relevant and understandable information related to urologic telesurgery, with Gemini more consistently providing sources. However, neither chatbot reliably offers actionable responses, limiting their utility as a standalone gateway for patient decision-making.

Version published to 10.1007/s11701-025-02871-8
Oct 14, 2025
Version published to 10.21203/rs.3.rs-7527866/v1 on Research Square
Sep 12, 2025

Evaluation of AI-Generated Multiple-Choice Questions for Periodontology Exams: A Quality Assessment Study

This article has 6 authors:
1. Bushra Ahmad
2. Livia Valverde
3. Shruti Jain
4. Khaled Saleh
5. Nadeem Karimbux
6. Y. Natalie Jeong
This article has no evaluationsLatest version Jan 19, 2026
Evaluation of robotic exposure among gynecological surgeons: results of survey from the Young European Advocates of Robotic Surgery (YEARS)

This article has 9 authors:
1. Sergi Fernandez-Gonzalez
2. Dina El-Hamamsy
3. Eleni Karatrasoglou
4. Anumithra Amirthanayagam
5. Alberto Muñoz Solano
6. Charlotte Collet
7. Daniel Galvin
8. Manou Manpreet Kaur
9. Christina Uwins
This article has no evaluationsLatest version Dec 22, 2025
Comparative efficacy of ChatGPT-5.1 Auto and DeepSeek-V3.1 Thinking in answering patients’ questions on cervical spine surgery

This article has 4 authors:
1. Xiaoyang Huo
2. Jiaming Zhou
3. Rongzhi Ma
4. Yuan Xue
This article has no evaluationsLatest version Jan 23, 2026

Discuss this preprint

Listed in

Abstract

Introduction

Methods

Results

Conclusion

Article activity feed

Related articles

Evaluation of AI-Generated Multiple-Choice Questions for Periodontology Exams: A Quality Assessment Study

Evaluation of robotic exposure among gynecological surgeons: results of survey from the Young European Advocates of Robotic Surgery (YEARS)

Comparative efficacy of ChatGPT-5.1 Auto and DeepSeek-V3.1 Thinking in answering patients’ questions on cervical spine surgery