clickBrick Prompt Engineering: Optimizing Large Language Model Performance in Clinical Psychiatry

F Gerrik Verhees
Fabian Huth
Vincent Meyer
Fabian Wolf
Michael Bauer
Andrea Pfennig
Philipp Ritter
Jakob N Kather
Isabella C Wiest
Pavol Mikolas

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Prompt engineering has the potential to enhance large language models (LLM) ability to solve tasks through improved in-context learning. In clinical research, the use of LLMs has shown expert-level performance for a variety of tasks ranging from pathology slide classification to identifying suicidality. We introduce clickBrick, a modular prompt-engineering framework, and rigorously test its effectiveness. Methods Here, we explore the effects of increasingly structuring prompts with the clickBrick framework for a comprehensive psychopathological assessment of 100 index patients from psychiatric electronic health records. We compare the performance of a locally-run LLM against an expert-labelled ground truth for a variety of successively built-up prompts for the extraction of 12 transdiagnostic psychopathological criteria. Potential clinical value was explored by training linear support vector machines on outputs from the strongest and weakest prompts to predict discharge ICD-10 main diagnoses for a historical sample of 1,692 patients. Outcomes We could reliably extract information across 12 distinct psychopathological classification tasks from unstructured clinical text with balanced accuracies spanning 71 % to 94%. Across tasks, we observed a substantially improved extraction accuracy (between +19% and +36%) using clickBrick. The comparison unveiled great variations between prompts with a reasoning prompt performing best in 7 out of 12 domains. Clinical value and internal validity were approximated by downstream classification of eventual psychiatric diagnoses for 1,692 patients. Here, clickBrick led to an improvement in overall classification accuracy from 71% to 76%. Interpretation ClickBrick prompt engineering, i.e. iterative, expert-led design and testing, is critical for unlocking LLMs clinical potential. The framework offers a reproducible pathway for deploying trustworthy generative AI across mental health and other clinical fields. Funding The German Ministry of Research, Technology and Space and the German Research Foundation.

Version published to 10.1101/2025.06.28.25330267 on medRxiv
Jun 30, 2025

Large Language Models for Psychiatric Phenotype Extraction from Electronic Health Records

This article has 10 authors:
1. Clara Frydman-Gani
2. Alejandro Arias
3. Maria Perez Vallejo
4. John Daniel Londoño Martínez
5. Johanna Valencia-Echeverry
6. Mauricio Castaño
7. Alex A. T. Bui
8. Nelson B. Freimer
9. Carlos Lopez-Jaramillo
10. Loes M. Olde Loohuis
This article has no evaluationsLatest version Aug 12, 2025
Implementation of Large Language Models in Electronic Health Records

This article has 3 authors:
1. Maxime Griot
2. Jean Vanderdonckt
3. Demet Yuksel
This article has no evaluationsLatest version Jul 4, 2025
Integrating Expert Knowledge into Large Language Models Improves Performance for Psychiatric Reasoning and Diagnosis

This article has 7 authors:
1. Karthik V Sarma
2. Kaitlin E Hanss
3. Andrew J M Halls
4. Andrew Krystal
5. Daniel F Becker
6. Anne L Glowinski
7. Atul J Butte
This article has no evaluationsLatest version Jul 21, 2025

Listed in

Abstract

Article activity feed

Related articles

Large Language Models for Psychiatric Phenotype Extraction from Electronic Health Records

Implementation of Large Language Models in Electronic Health Records

Integrating Expert Knowledge into Large Language Models Improves Performance for Psychiatric Reasoning and Diagnosis