ChatGPT Performance in a questionnaire on rheumatological diseases: A Comparison with Specialist’s Opinion

Lucas Gonçalves
Carlos Antonio Moura

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background This study aims to compare the performance of Generative Pretrained Transformer Chat 4.0 (ChatGPT 4.0) with rheumatologists of varying experience levels in a questionnaire on systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), ankylosing spondylitis (AS), psoriatic arthritis (PsA), and fibromyalgia (FM). Methods In this cross-sectional study, a 25-question questionnaire (five questions per disease) was administered to ChatGPT 4.0 and four pairs of rheumatologists with different experience levels (less than 5 years, 5–10 years, 11–20 years, and 21–30 years). Two rheumatologists with more than 30 years of experience and linked to academic services blindly evaluated the responses as "agree" or "disagree". In questions where there was disagreement between the evaluators, a third rheumatologist defined the dispute. Results The group with 5–10 years of experience had the best overall performance, with a 70% agreement probability with the evaluators, followed by ChatGPT 4.0 at 68%. The group with 21–30 years of experience had the worst performance (58%). ChatGPT 4.0 outperformed all other groups in questions regarding the first treatment option and the most effective imaging exams for investigation (100% in both). However, it had the poorest performance in identifying the most useful sign or symptom for diagnosing each disease. Conclusions ChatGPT 4.0 excelled in areas requiring less practical knowledge, such as treatment choices and diagnostic imaging exams. Conversely, it performed poorly in questions necessitating experience-based knowledge, particularly in identifying key diagnostic signs and symptoms.

Version published to 10.21203/rs.3.rs-6484816/v1 on Research Square
May 14, 2025

Assessing ChatGPT's Performance in Delineating Uveitis: An analysis of responses to real-world case presentations

This article has 8 authors:
1. Muhammad Sohail Halim
2. Aly Hamza Khowaja
3. Zoha Zahid Fazal
4. Tanya Jain
5. Kholood Janjua
6. Ammar Aamir Khan
7. Anh Ngoc Tram Tran
8. Yasir J Sepah
This article has no evaluationsLatest version Jul 6, 2025
The Scope and Limitations of Extant Research into ChatGPT as a Tool for Patient Education: Systematic Review

This article has 4 authors:
1. Reid Dale
2. Maggie Cheng
3. Katharine Casselman Pines
4. Maria Elizabeth Currie
This article has no evaluationsLatest version May 21, 2025
Exploring the Role of ChatGPT in Supporting Diagnostic Decision-Making in Dentistry: A Mixed-Methods Comparative Study

This article has 6 authors:
1. Marwan Aljohani
2. Maram Alwadi
3. Albandri Alghris
4. Layla Alenzi
5. Bodor Alshammari
6. Falah R Alshammari
This article has no evaluationsLatest version Jun 12, 2025

Listed in

Abstract

Article activity feed

Related articles

Assessing ChatGPT's Performance in Delineating Uveitis: An analysis of responses to real-world case presentations

The Scope and Limitations of Extant Research into ChatGPT as a Tool for Patient Education: Systematic Review

Exploring the Role of ChatGPT in Supporting Diagnostic Decision-Making in Dentistry: A Mixed-Methods Comparative Study