ChatGPT Performance in a questionnaire on rheumatological diseases: A Comparison with Specialist’s Opinion
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background This study aims to compare the performance of Generative Pretrained Transformer Chat 4.0 (ChatGPT 4.0) with rheumatologists of varying experience levels in a questionnaire on systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), ankylosing spondylitis (AS), psoriatic arthritis (PsA), and fibromyalgia (FM). Methods In this cross-sectional study, a 25-question questionnaire (five questions per disease) was administered to ChatGPT 4.0 and four pairs of rheumatologists with different experience levels (less than 5 years, 5–10 years, 11–20 years, and 21–30 years). Two rheumatologists with more than 30 years of experience and linked to academic services blindly evaluated the responses as "agree" or "disagree". In questions where there was disagreement between the evaluators, a third rheumatologist defined the dispute. Results The group with 5–10 years of experience had the best overall performance, with a 70% agreement probability with the evaluators, followed by ChatGPT 4.0 at 68%. The group with 21–30 years of experience had the worst performance (58%). ChatGPT 4.0 outperformed all other groups in questions regarding the first treatment option and the most effective imaging exams for investigation (100% in both). However, it had the poorest performance in identifying the most useful sign or symptom for diagnosing each disease. Conclusions ChatGPT 4.0 excelled in areas requiring less practical knowledge, such as treatment choices and diagnostic imaging exams. Conversely, it performed poorly in questions necessitating experience-based knowledge, particularly in identifying key diagnostic signs and symptoms.