Chat GPT Against Medical Students: A Comparative Analysis of Image-Based Medical Examination Results
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Artificial intelligence (AI), particularly large language models like ChatGPT, is increasingly shaping medical education. While these systems show promise for automated feedback and adaptive assessments, their performance in visually intensive, image-based disciplines remains insufficiently studied. Objective To compare the performance of ChatGPT-4.0 and undergraduate medical students on standardized, image-based multiple-choice questions in Anatomy, Pathology, and Pediatrics, evaluating domain-specific strengths and limitations of generative AI in visual reasoning. Standardized exams were administered to second-, third-, and fifth-year students, and the same questions were submitted to ChatGPT-4.0 using a two-step deterministic and stochastic protocol. Items with images that ChatGPT failed to recognize were excluded. Student responses were pooled after verifying normality, variance, and sample size equivalence, with subgroup analyses restricted to questions with a discrimination index ≥ 0.1. Paired t-tests or Wilcoxon signed-rank tests were used for comparisons. Results Of 90 questions, only 52 were eligible for analysis due to ChatGPT’s inability to interpret certain images. ChatGPT significantly underperformed in Anatomy (mean difference = − 0.387, p < 0.00001, d = 2.10), with similar results in the discrimination-based subgroup. In contrast, ChatGPT outperformed students in Pediatrics (mean difference = + 0.174, p = 0.00013, d = 0.81), with a greater effect in the validated subgroup. Pathology was excluded from comparison due to complete image recognition failure. Conclusion These findings demonstrate marked variability in ChatGPT’s visual reasoning across medical domains, underlining the need for multimodal integration and critical evaluation of AI applications to enhance AI effectiveness in medical education.