Evaluating ChatGPT-4o's Reasoning in Ocular Injury Using NEISS Data

Ezanna Mesfin
Matthew Heider
Mohamed Heiba
Nicholas Stratigakis
Ross Stuber
John Laudi
Inci Dersu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose: This study evaluates ChatGPT-4o’s reasoning process in triaging ocular injury cases in classification data from the National Electronic Injury Surveillance System (NEISS). The goal is to analyze the reasoning used by the model and its accuracy. Design: Retrospective cohort study. Participants: 2 145 cases of ocular injuries were randomly selected from the NEISS database under an IRB exempt protocol. Methods: 2 145 ocular injury cases were randomly sampled, and evenly categorized into three triage levels: emergent, urgent, and routine. ChatGPT-4o was tasked to assign triage levels recommend interventions, and provide reasoning for each decision. The model’s reasoning was categorized into four types: 1) "Blunt trauma may cause delayed complications like retinal detachment," 2) "Chemical injuries can cause severe ocular damage and require immediate attention," 3) "Foreign body cases can cause vision-threatening complications if not treated urgently," and 4) "The injury appears non-urgent based on provided details." We evaluated the frequency and accuracy of each reasoning type and analyzed their association with triage categories. Main Outcome Measures: Accuracy of ChatGPT-4o in correctly triaging emergent, urgent, and routine cases. Results: ChatGPT-4o correctly categorized 60% of all cases. The model frequently defaulted to routine classification, contributing to under-recognition of urgent cases. Conclusion: ChatGPT-4o shows potential for triaging ocular injuries, especially in identifying emergent cases. However, it struggles with nuanced reasoning.

Version published to 10.21203/rs.3.rs-7359662/v1 on Research Square
Sep 9, 2025

Evaluating ChatGPT-4’s Role in Diagnosing and Grading Diabetic Retinopathy from Fundus Images

This article has 4 authors:
1. Ishan Bhanot
2. Nitin Rangu
3. David Seo
4. Vinay Shah
This article has no evaluationsLatest version Sep 9, 2025
To propose a new classification system for open globe injuries (OGIs) and analyse its management outcomes

This article has 2 authors:
1. Md. Alam
2. Bristi Majumdar
This article has no evaluationsLatest version Sep 8, 2025
Development and Validation of a Patient-Reported Checklist for Dermatological Complications in Chronic Spinal Cord Injury

This article has 16 authors:
1. Vafa Rahimi-Movaghar
2. Homayoon Khaledian
3. Pantea Bozorg Savoji
4. Ali Heidari Roochi
5. Maryam Ranjbar
6. Ahmad Ghaffari
7. Heshmatollah Babaei
8. Nikoo Mozafari
9. Iman Sarmadi
10. Khatereh Naghdi
11. Hannan Ebrahimi
12. Farzin Farahbakhsh
13. Ahmad Pour-Rashidi
14. Mohsen Sadeghi-Naini
15. Hamed Hanif
16. Mohammad Reza Hossein Kazemi
This article has no evaluationsLatest version Aug 29, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Evaluating ChatGPT-4’s Role in Diagnosing and Grading Diabetic Retinopathy from Fundus Images

To propose a new classification system for open globe injuries (OGIs) and analyse its management outcomes

Development and Validation of a Patient-Reported Checklist for Dermatological Complications in Chronic Spinal Cord Injury