Comparison of Intern Doctors and Chat-GPT in Emergency Cases Assessment
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Accurate and timely diagnosis in emergency departments (EDs) is critical due to the high patient volume and time-sensitive nature of care. Intern doctors (IDs), who are about to graduate from medical school, often work in EDs for a period in many countries. However, after graduate, physicians are often expected to take on critical patient care responsibilities despite limited experience. Artificial intelligence (AI) models can rapidly analyze patient data and generate diagnoses, thereby assisting inexperienced physicians in improving diagnostic accuracy. This study aims to evaluate the diagnostic performance of ChatGPT-4 in ED case scenarios and compare its accuracy with that of IDs. Methods: This study was conducted with IDs participating in the internship program during the 2024 academic term. A total of 36 case-based questions, categorized by difficulty level. These questions were administered to 155 IDs via Google Documents and subsequently presented to AI. Descriptive statistics were used to summarize the data, and a one-sample t-test was performed to compare diagnostic accuracy between IDs and ChatGPT. Statistical significance was set at p < 0.05. Results: IDs achieved an overall correct response rate of 58.3%, while ChatGPT reached a rate of 97.2%. A statistically significant, moderate negative correlation was observed between question difficulty and IDs’ performance (r = -0.684; p < 0.001), indicating decreased accuracy as question difficulty increased. ChatGPT consistently maintained significantly higher performance regardless of difficulty level. Conclusion: ChatGPT-4 may serve as a valuable diagnostic support tool in EDs. It can be particularly beneficial for newly graduated physicians with limited clinical experience. Clinical trial number: not applicable .