Comparative performance of ChatGPT, Gemini, and DeepSeek on endodontic exam questions in Turkish and English

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective This study aimed to compare the performance of ChatGPT-4, Gemini 2.0, and DeepSeek-R1 in answering dentistry specialty exam (DUS) endodontics questions in Turkish and English. Methods A total of 130 multiple-choice endodontics questions from the DUS question pool were presented to ChatGPT-4 (OpenAI), Gemini 2.0 (Google), and DeepSeek-R1 under standardized conditions in both languages. Responses were categorized as “correct answer with correct explanation,” “correct answer with incorrect explanation,” and “incorrect.” Statistical analysis was performed using R, applying McNemar’s Chi-squared test and Fisher’s Exact Test (significance level: p < 0.05). Results All models performed better in English than in Turkish. In Turkish, DeepSeek-R1 and Gemini 2.0 significantly outperformed ChatGPT-4. Simple-style questions were answered more accurately than combination-style questions by all models in both languages. Conclusions LLMs show potential in standardized dental exams but still face challenges in fully grasping conceptual knowledge and may generate hallucinations. Continuous development is needed to improve their accuracy across languages and subject areas..

Article activity feed