Evaluating ChatGPT-4o's Reasoning in Ocular Injury Using NEISS Data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Purpose: This study evaluates ChatGPT-4o’s reasoning process in triaging ocular injury cases in classification data from the National Electronic Injury Surveillance System (NEISS). The goal is to analyze the reasoning used by the model and its accuracy. Design: Retrospective cohort study. Participants: 2 145 cases of ocular injuries were randomly selected from the NEISS database under an IRB exempt protocol. Methods: 2 145 ocular injury cases were randomly sampled, and evenly categorized into three triage levels: emergent, urgent, and routine. ChatGPT-4o was tasked to assign triage levels recommend interventions, and provide reasoning for each decision. The model’s reasoning was categorized into four types: 1) "Blunt trauma may cause delayed complications like retinal detachment," 2) "Chemical injuries can cause severe ocular damage and require immediate attention," 3) "Foreign body cases can cause vision-threatening complications if not treated urgently," and 4) "The injury appears non-urgent based on provided details." We evaluated the frequency and accuracy of each reasoning type and analyzed their association with triage categories. Main Outcome Measures: Accuracy of ChatGPT-4o in correctly triaging emergent, urgent, and routine cases. Results: ChatGPT-4o correctly categorized 60% of all cases. The model frequently defaulted to routine classification, contributing to under-recognition of urgent cases. Conclusion: ChatGPT-4o shows potential for triaging ocular injuries, especially in identifying emergent cases. However, it struggles with nuanced reasoning.

Article activity feed