Potential of ChatGPT in Youth Mental Health Emergency Triage: Comparative Analysis with Clinicians

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Large language models (LLMs), such as GPT-4, are increasingly integrated into healthcare to support clinicians in making informed decisions. Given ChatGPT’s potential, it is necessary to explore such applications as a support tool, particularly within mental health telephone triage services. This study evaluates whether GPT-4 models can accurately triage psychiatric emergency vignettes and compares its performance to clinicians.

Methods

A cross-sectional study with qualitative analysis was conducted. Two clinical psychologists developed 22 psychiatric emergency vignettes. Responses were generated by three versions of GPT-4 (GPT-4o, GPT-4o Mini, GPT-4 Legacy) using ChatGPT, and two independent nurse practitioners (clinicians). The responses focused on three triage criteria: risk (Low 1-3 High), admission (Yes-1; No-2), and urgency (Low 1-3 High).

Results

Substantial interrater reliability was observed between clinicians and GPT-4 responses across the three triage criteria (Cohen’s Kappa: Admission = 0.77; Risk = 0.78; Urgency = 0.76). Among the GPT-4 models, Kappa values indicated moderate to substantial agreement (Fleiss’ Kappa: Admission = 0.69, Risk = 0.63, Urgency = 0.72). The mean scores for triage criteria responses between GPT-4 models and clinicians exhibited consistent patterns with minimal variability. Admission responses had a mean score of 1.73 (SD = 0.45), risk scores had a mean of 2.12 (SD= 0.83), and urgency scores averaged 2.27 (SD = 0.44).

Conclusion

This study suggests that GPT-4 models could be leveraged as a support tool in mental health telephone triage, particularly for psychiatric emergencies. While findings are promising, further research is required to confirm clinical relevance.

Article activity feed