Simple Prompting Enhances ChatGPT’s Diagnostic Accuracy in Psychiatric Cases

Seraphina Fong
Alessandro Carollo
Martina Dal Maso
Giovanni Martinotti
Debora Luciani
Yasser Saeed Khan
Luca Pellegrini
Ornella Corazza
Gianluca Esposito

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Despite the centrality of the diagnostic assessment in psychiatry, the agreement among mental health practitioners often varies from poor to moderate. The potential of Large Language Models (LLMs; such as ChatGPT), among other approaches, has been studied to be used as standardized tools to support clinicians’ decision-making. The current work investigates the diagnostic accuracy of ChatGPT 3.5 (gpt-3.5) across different case presentation styles (i.e., vignette and outline) and prompting techniques. A total of 46 psychiatric cases with an accompanying diagnosis were used. Two trained clinical psychologists evaluated the accuracy of the generated diagnosis against the reference diagnosis. A robust statistical approach was then used to investigate the effect of case format and prompt type on the average diagnostic accuracy. The results showed a moderate agreement between the ratings of the two clinical psychologists (kappa = 0.687). Moreover, a statistically significant main effect of prompting technique on gpt-3.5 diagnostic accuracy emerged (p = .009). The highest accuracy was achieved when gpt-3.5 was simply instructed to provide and justify a single diagnosis for each case as compared to when it was asked to provide a diagnosis likelihood (p < .001) or when it was asked to act as a clinical psychologist (p = .001). The results of the current work reinforce the potential to use LLMs as a supporting tool for the diagnostic step in psychiatry and provide a general indication in order to ensure good performance when using them. Additionally, this study offers a methodological framework that can serve as an example for future research aiming to systematically evaluate LLMs’ diagnostic capabilities across different prompting strategies and case presentation formats.

Version published to 10.31219/osf.io/fd8w5_v2 on OSF Preprints
Oct 13, 2025
Version published to 10.31219/osf.io/fd8w5 on OSF Preprints
Sep 11, 2024

Automated Detection Of Clinical High Risk Population Of Schizophrenia: Assessing The Generalizability Of NLP And LLM-Based Methods

This article has 30 authors:
1. Jiaee Cheong
2. Cheryl M. Corcoran
3. Kathryn E. Lewandowski
4. Ofer Pasternak
5. Sinead Kelly
6. Sylvain Bouix
7. Abraham Reichenberg
8. Carrie E. Bearden
9. Guillermo Cecchi
10. Justin T. Baker
11. Marek Kubicki
12. Tina Kapur
13. Daniel H. Mathalon
14. Kang-Ik K. Cho
15. Inge Winter-van Rossum
16. Michael J. Coleman
17. Tashrif Billah
18. Dheshan Mohandass
19. Yoonho Chung
20. Habiballah Rahimi Eichi
21. Youngsun T. Cho
22. Zailyn Tamayo
23. Jessica Hartmann
24. Patrick D. McGorry
25. Rene S. Kahn
26. John M. Kane
27. Scott W. Woods
28. Martha E. Shenton
29. Barnaby Nelson
30. John Torous
This article has no evaluationsLatest version Feb 4, 2026
Assessment of the efficacy of ChatGPT responses to bacterial species-specific questions in microbiology.

This article has 6 authors:
1. Withanage Dona Manushi Dinasha Withanage
2. Nissanka Mudiyanselage Tanuri Ayanga Nissanka
3. Chamudhi Prabashi Wickramasinghe
4. Warnakulasuriya Palakuttige Pasindu Damsara Fernando
5. Vindya Perera
6. Gayan Danushka Gunatilake
Reviewed by Access Microbiology

This article has 2 evaluationsLatest version Dec 9, 2025Latest activity Nov 5, 2025
Challenges in the Diagnosis of Autism Spectrum Disorder: Contributions from Speech-Language Pathology

This article has 3 authors:
1. Renata Barros
2. Isabela Rodriguez
3. Eric Ferreira
This article has no evaluationsLatest version Jan 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Automated Detection Of Clinical High Risk Population Of Schizophrenia: Assessing The Generalizability Of NLP And LLM-Based Methods

Assessment of the efficacy of ChatGPT responses to bacterial species-specific questions in microbiology.

Challenges in the Diagnosis of Autism Spectrum Disorder: Contributions from Speech-Language Pathology