Characterizing Documented Psychosocial Stressors in Pediatric Psychiatric Emergencies with an Open-Weight Large Language Model

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective

To evaluate whether a locally hosted open-weight large language model (LLM) can extract documented psychosocial factors from pediatric psychiatric intake notes and apply validated extraction to a large emergency psychiatry cohort.

Materials and Methods

We identified emergency department presentations at Cincinnati Children’s Hospital Medical Center from January 1, 2016, through December 31, 2024, among patients <18 years with psychiatric billing diagnoses. Using full-text intake notes, gpt-oss:120b classified peer conflict, sleep disruption, and school-related academic, attendance, and disciplinary issues as detected, negated, or indeterminate. Four human raters independently reviewed 50 notes. We compared Fleiss’ κ among humans alone versus humans plus the LLM, assessed repeated-query stability across 50 independent calls per note, and applied the workflow to all eligible notes.

Results

Among 37,315 eligible admissions, 22,284 had eligible intake notes; 22,270 produced parseable JSON. In detected-vs-not-detected coding, human-plus-LLM reliability did not differ significantly from human-only reliability across measures (human κ, 0.71–0.94; human-plus-LLM κ, 0.70–0.93). Stability was associated with human agreement: mean LLM-human agreement increased from 42.6% for classifications with <80% stability to 82.7% for 100% stability (Pearson r=0.36). Full-cohort extraction showed frequent and overlapping documented factors: sleep disruption was most frequently detected (57.7%), followed by peer conflict (47.2%), academic issues (43.4%), disciplinary issues (43.3%), and attendance issues (16.9%).

Discussion

Agreement varied by construct and was strongest when repeated model outputs were stable.

Conclusion

Locally hosted open-weight LLMs can support scalable structured extraction of documented psychosocial factors from pediatric psychiatric intake notes after local validation.

Article activity feed