Evaluating large language models for predicting psychiatric acute readmissions from clinical notes of population-based EHR
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Psychiatric patients often have complex symptoms and anamneses recorded as unstructured clinical notes. Large language models (LLM) now enable large-scale utilization of text data; however, there is a current lack of LLMs specialized for psychiatric clinical data, as well as non-English data, haltering the application of LLMs across diverse clinical domains and countries. We present PsyRoBERTa: the first LLM specialized for clinical psychiatry, using population-based data with the currently largest collection of clinical notes of psychiatric relevancy (∼44 million notes) covering the eastern half of Denmark. The model was evaluated against three publicly available models, pretrained on either public general- or medical-domain text, and a baseline logistic regression classifier. Through extensive evaluations, we investigated the effect of domain-specific pretraining on predicting acute readmissions in psychiatric hospitals, explored important features, and reflected on (dis)advantages of LLMs. PsyRoBERTa succeeded in outperforming prior models (AUC=0.74), capturing information aligning with clinical practice, and additionally recognizing psychiatric diagnoses (AUC=0.85). This demonstrates the importance of domain-pretraining and the potential of LLMs to leverage psychiatric clinical notes for enhancing prediction of psychiatric outcomes.