Assessing ChatGPT's Performance in Delineating Uveitis: An analysis of responses to real-world case presentations

Muhammad Sohail Halim
Aly Hamza Khowaja
Zoha Zahid Fazal
Tanya Jain
Kholood Janjua
Ammar Aamir Khan
Anh Ngoc Tram Tran
Yasir J Sepah

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background In the world of Artificial Intelligence (AI), Generative Pretrained Transformer-3 (GPT-3) has gained significant popularity for its demonstrated potential in medical education and diagnostics. However, its understanding of ocular urgencies, particularly uveitis, demands a focused investigation. Methods This proof-of-concept study explored the application of ChatGPT, a language model derived from GPT-3, in delineating uveitis based on 24 case presentations. We analyzed ChatGPT communication quality through 14 qualitative metrics by computing patient data at four different levels to act as prompts. These included patient history, drug history, examination findings, and clinical investigations. Results Our results showed that at the initial prompt, ChatGPT responses were comprehensive for most (8 out of 14) variables and correct but inadequate for some (3 out of 14) variables in the majority (>50.0%) of uveitis cases. Ethical considerations was the only variable in terms of which responses consistently showed mixed accuracy and outdated data across all prompts in most (95.8%) uveitis cases. Also, none of the ChatGPT responses were completely inaccurate in terms of any variable at any prompt for any uveitis case. Conclusion The results reveal ChatGPT strengths and limitations in answering queries for patients with uveitis or its differential diagnosis, while emphasizing the indispensable role of physicians in ethical decision-making.

Version published to 10.1101/2025.07.05.25330926 on medRxiv
Jul 6, 2025

The Potential of ChatGPT as an Aiding Tool for the Neuroradiologist

This article has 2 authors:
1. Simon Nikola
2. Dan Paz
This article has no evaluationsLatest version Jul 14, 2025
Benchmark Evaluation of Multi-Modal Large Language Models for Ophthalmic Diagnosis

This article has 10 authors:
1. Weihua Yang
2. Shoujun Huang
3. Junhong Chen
4. Jiaoman Wang
5. Ping Zhang
6. Wending Du
7. Yuan Hong
8. Dexing Kong
9. Wei Lou
10. Wei Chi
This article has no evaluationsLatest version Jul 23, 2025
clickBrick Prompt Engineering: Optimizing Large Language Model Performance in Clinical Psychiatry

This article has 10 authors:
1. F Gerrik Verhees
2. Fabian Huth
3. Vincent Meyer
4. Fabian Wolf
5. Michael Bauer
6. Andrea Pfennig
7. Philipp Ritter
8. Jakob N Kather
9. Isabella C Wiest
10. Pavol Mikolas
This article has no evaluationsLatest version Jun 30, 2025

Listed in

Abstract

Article activity feed

Related articles

The Potential of ChatGPT as an Aiding Tool for the Neuroradiologist

Benchmark Evaluation of Multi-Modal Large Language Models for Ophthalmic Diagnosis

clickBrick Prompt Engineering: Optimizing Large Language Model Performance in Clinical Psychiatry