Simple Prompting Enhances ChatGPT’s Diagnostic Accuracy in Psychiatric Cases

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Despite the centrality of the diagnostic assessment in psychiatry, the diagnostic agreement among mental health practitioners often varies from poor to moderate. The potential of Large Language Models (LLMs; such as ChatGPT), among other approaches, has been studied to be used as standardized tools to support clinicians’ decision-making. The current work investigates the diagnostic accuracy of ChatGPT 3.5 across different case presentation styles (i.e., vignette and outline) and prompting techniques. A total of 46 psychiatric cases with an accompanying diagnosis were used. Two trained clinical psychologists evaluated the accuracy of the generated diagnosis against the reference diagnosis. A robust statistical approach was then used to investigate the effect of case format and prompt type on the average diagnostic accuracy. The results showed a moderate agreement between the ratings of the two clinical psychologists (kappa = 0.687). Moreover, a statistically significant main effect of prompting technique on ChatGPT diagnostic accuracy emerged (p = 0.009). The highest accuracy was achieved when ChatGPT 3.5 was simply instructed to provide and justify a single diagnosis for each case as compared to when it was asked to provide a diagnosis likelihood (p < 0.001) or when it was asked to act as a clinical psychologist (p < 0.001). The results of the current work reinforce the potential to use ChatGPT as a supporting tool for the diagnostic step in psychiatry and provide a general indication in order to ensure good performance when using it.

Article activity feed