ChatGPT Is Still Not Good Enough at Giving Care-Seeking Advice, or Is It?

Marvin Kopka
Longqi He
Markus A. Feufel

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Artificial Intelligence tools like ChatGPT are increasingly used by patients to support their care-seeking decisions, although the accuracy of newer models remains unclear. We evaluated 16 ChatGPT models using 45 validated vignettes, each prompted ten times (7,200 total assessments). Each model classified the vignettes as requiring emergency care, non-emergency care, or self-care. We evaluated accuracy against each case’s gold standard solution, examined the variability across trials, and tested algorithms to aggregate multiple recommendations to improve accuracy. o1-mini achieved the highest accuracy (78%), but we could not observe an overall improvement with newer models – although reasoning models (e.g., o4-mini) improved their accuracy in identifying self-care cases. Selecting the lowest urgency level across multiple trials improved accuracy by 4 percentage points. Although newer models slightly outperform laypeople, their accuracy remains insufficient for standalone use. However, making use of output variability with aggregation algorithms can improve the performance of these models.

Version published to 10.1101/2025.05.13.25327519 on medRxiv
May 14, 2025

Artificial Intelligence in Clinical Practice: Evaluating Chatbot Performance on Board-Level Questions in Geriatrics

This article has 2 authors:
1. Mert Zure
2. Metin Sökmen
This article has no evaluationsLatest version Jan 21, 2026
Poetic or Prosaic? Evaluating the Linguistic Quality of AI-Generated Draft Replies to Patient Portal Messages

This article has 8 authors:
1. Gavin Hui
2. Laura Prichard
3. Taylor Martin
4. Sitaram Vangala
5. Joshua Khalili
6. Sun M. Yoo
7. Hawkin E. Woo
8. Paul J. Lukac
This article has no evaluationsLatest version Dec 11, 2025
UnderstandingMCI.ca: Mixed-Methods Evaluation of a Brief Web-Based Multimedia Lesson to Improve Public and Family Care Partner Knowledge of Mild Cognitive Impairment

This article has 7 authors:
1. Victoria Meng
2. Dima Hadid
3. Stephanie Ayers
4. Sandra Clark
5. Rebekah Woodburn
6. Roland Grad
7. Anthony J. Levinson
This article has no evaluationsLatest version Dec 25, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Artificial Intelligence in Clinical Practice: Evaluating Chatbot Performance on Board-Level Questions in Geriatrics

Poetic or Prosaic? Evaluating the Linguistic Quality of AI-Generated Draft Replies to Patient Portal Messages

UnderstandingMCI.ca: Mixed-Methods Evaluation of a Brief Web-Based Multimedia Lesson to Improve Public and Family Care Partner Knowledge of Mild Cognitive Impairment