The use of artificial intelligence based chat bots in ophthalmology triage

Daniel David
Ofira Zloto
Gabriel Katz
Ruth Huna-Baron
Vicktoria Vishnevskia-Dai
Sharon Armarnik
Noa Avni Zauberman
Elinor Megiddo Barnir
Reut Singer
Avner Hostovsky
Eyal Klang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose

To evaluate AI-based chat bots ability to accurately answer common patient’s questions in the field of ophthalmology.

Methods

An experienced ophthalmologist curated a set of 20 representative questions and responses were sought from two AI generative models: OpenAI’s ChatGPT and Google’s Bard (Gemini Pro). Eight expert ophthalmologists from different sub-specialties assessed each response, blinded to the source, and ranked them by three metrics—accuracy, comprehensiveness, and clarity, on a 1–5 scale.

Results

For accuracy, ChatGPT scored a median of 4.0, whereas Bard scored a median of 3.0. In terms of comprehensiveness, ChatGPT achieved a median score of 4.5, compared to Bard which scored a median of 3.0. Regarding clarity, ChatGPT maintained a higher score with a median of 5.0, compared to Bard’s median score of 4.0. All comparisons were statistically significant ( p < 0.001).

Conclusion

AI-based chat bots can provide relatively accurate and clear responses for addressing common ophthalmological inquiries. ChatGPT surpassed Bard in all measured metrics. While these AI models exhibit promise, further research is indicated to improve their performance and allow them to be used as a reliable medical tool.

Version published to 10.1038/s41433-024-03488-1
Nov 26, 2024
Version published to 10.21203/rs.3.rs-4406223/v1 on Research Square
Aug 9, 2024

Comparative efficacy of ChatGPT-5.1 Auto and DeepSeek-V3.1 Thinking in answering patients’ questions on cervical spine surgery

This article has 4 authors:
1. Xiaoyang Huo
2. Jiaming Zhou
3. Rongzhi Ma
4. Yuan Xue
This article has no evaluationsLatest version Jan 23, 2026
Performance of Next-Generation AI Chatbots in Gynecological Knowledge Assessment: A Comparative Pilot Study of ChatGPT-5, Gemini-3, DeepSeek-V3.2, and Claude-4.5-Opus

This article has 2 authors:
1. Huan Out
2. Zhen Wang
This article has no evaluationsLatest version Dec 16, 2025
Artificial Intelligence in Clinical Practice: Evaluating Chatbot Performance on Board-Level Questions in Geriatrics

This article has 2 authors:
1. Mert Zure
2. Metin Sökmen
This article has no evaluationsLatest version Jan 21, 2026

Discuss this preprint

Listed in

Abstract

Purpose

Methods

Results

Conclusion

Article activity feed

Related articles

Comparative efficacy of ChatGPT-5.1 Auto and DeepSeek-V3.1 Thinking in answering patients’ questions on cervical spine surgery

Performance of Next-Generation AI Chatbots in Gynecological Knowledge Assessment: A Comparative Pilot Study of ChatGPT-5, Gemini-3, DeepSeek-V3.2, and Claude-4.5-Opus

Artificial Intelligence in Clinical Practice: Evaluating Chatbot Performance on Board-Level Questions in Geriatrics