“This is a quiz” Premise Input: A Key to Unlocking Higher Diagnostic Accuracy in Large Language Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Purpose
Large language models (LLMs) are neural network models trained on vast amounts of textual data, showing promising performance in various fields. In radiology, studies have demonstrated the strong performance of LLMs in diagnostic imaging quiz cases. However, the inherent differences of prior probabilities of a final diagnosis between clinical and quiz cases pose challenges for LLMs, as LLMs had not been informed about the quiz nature in previous literature, while human physicians can optimize the diagnosis, consciously or unconsciously, depending on the situation. The present study aimed to test the hypothesis that notifying LLMs about the quiz nature might improve diagnostic accuracy.
Methods
One-hundred-and-fifty consecutive cases from the “Case of the Week” radiological diagnostic quiz case series on the American Journal of Neuroradiology website were analyzed. GPT-4o and Claude 3.5 Sonnet were used to generate top three differential diagnoses based on the textual clinical history and figure legends. The prompts included or excluded information about the quiz nature for both models. Two radiologists evaluated the accuracy of the diagnoses. McNemar’s test assessed differences in correct response rates.
Results
Informing the quiz nature improved the diagnostic performance of both models. Specifically, Claude 3.5 Sonnet’s primary diagnosis and GPT-4o’s top 3 differential diagnoses significantly improved when the quiz nature was informed.
Conclusion
Informing the quiz nature of cases significantly enhances LLMs’ diagnostic performances. This insight into LLMs’ capabilities could inform future research and applications, highlighting the importance of context in optimizing LLM-based diagnostics.