Performance of Chat Gpt on a Turkish Board of Orthopaedi̇c Surgery Examination

Süleyman Kaan Öner
Bilgehan Ocak
Yavuz Şahbat
Recep Yasin Kurnaz
Emre Çilingir

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background This study aimed to evaluate the success of the Chat GPT according to the Turkish Board of Orthopedic Surgery Examination Methods Among the written exam questions prepared by TOTEK between 2021 and 2023, questions asking visual information like that in the literature and canceled questions were not included, and all other questions were taken into consideration. The questions were divided into 19 categories according to topic. Thequestions were divided into 3 categories according to the methods of evaluating information: direct recall of information, ability to comment and ability to use information correctly. Questions were asked separately about theChat GPT 3.5 and 4.0 artificial intelligence applications. All answers given were evaluated appropriately according to this grouping. Visual questions were not asked to the Chat GPT due to its inability to perceive visual questions. Only questions answered by the application with the correct choice and explanation were accepted as correct answers. Questions that were answered incorrectly by the Chat GPT were considered incorrect. Results We eliminated 300 visual questions in total and asked the remaining 265 multiple-choice questions about the Chat GPT. A total of 95 (35%) of 265 questions were answered correctly, and 169 (63%) were answered incorrectly. It was also seen that he could not answer 1 question. The exam success rate was greater for the Chat GPT group than for the control group, especially for the infection questions (67%). The descriptive findings are shown in Table 3, which shows that both artificial intelligence models can be effective at different levels on various issues, but predominantly, GPT 4 performs better. Conclusion Our study showed that although the Chat GPT could not reach the level of passing the Turkish Orthopedics and Traumatology Proficiency Exam, it could reach a certain level of accuracy. Software such as the Chat GPT needs to be developed and studied further to be useful for orthopedics and traumatology physicians, where the evaluation of radiological images and physical examination are very important.

Version published to 10.21203/rs.3.rs-4637339/v1 on Research Square
Aug 6, 2024

Evaluation of AI-Generated Multiple-Choice Questions for Periodontology Exams: A Quality Assessment Study

This article has 6 authors:
1. Bushra Ahmad
2. Livia Valverde
3. Shruti Jain
4. Khaled Saleh
5. Nadeem Karimbux
6. Y. Natalie Jeong
This article has no evaluationsLatest version Jan 19, 2026
Comparison of the ChatGPT and deepseek models in responding to multiple choice questions related to rehabilitation of completely edentulous patients with complete dentures

This article has 4 authors:
1. Amar Bhochhibhoya
2. Brijesh Maskey
3. Rejina Shrestha
4. Sirjana Dahal
This article has no evaluationsLatest version Jan 30, 2026
Comparative efficacy of ChatGPT-5.1 Auto and DeepSeek-V3.1 Thinking in answering patients’ questions on cervical spine surgery

This article has 4 authors:
1. Xiaoyang Huo
2. Jiaming Zhou
3. Rongzhi Ma
4. Yuan Xue
This article has no evaluationsLatest version Jan 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Evaluation of AI-Generated Multiple-Choice Questions for Periodontology Exams: A Quality Assessment Study

Comparison of the ChatGPT and deepseek models in responding to multiple choice questions related to rehabilitation of completely edentulous patients with complete dentures

Comparative efficacy of ChatGPT-5.1 Auto and DeepSeek-V3.1 Thinking in answering patients’ questions on cervical spine surgery