Characterizing Documented Psychosocial Stressors in Pediatric Psychiatric Emergencies with an Open-Weight Large Language Model
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective
To evaluate whether a locally hosted open-weight large language model (LLM) can extract documented psychosocial factors from pediatric psychiatric intake notes and apply validated extraction to a large emergency psychiatry cohort.
Materials and Methods
We identified emergency department presentations at Cincinnati Children’s Hospital Medical Center from January 1, 2016, through December 31, 2024, among patients <18 years with psychiatric billing diagnoses. Using full-text intake notes, gpt-oss:120b classified peer conflict, sleep disruption, and school-related academic, attendance, and disciplinary issues as detected, negated, or indeterminate. Four human raters independently reviewed 50 notes. We compared Fleiss’ κ among humans alone versus humans plus the LLM, assessed repeated-query stability across 50 independent calls per note, and applied the workflow to all eligible notes.
Results
Among 37,315 eligible admissions, 22,284 had eligible intake notes; 22,270 produced parseable JSON. In detected-vs-not-detected coding, human-plus-LLM reliability did not differ significantly from human-only reliability across measures (human κ, 0.71–0.94; human-plus-LLM κ, 0.70–0.93). Stability was associated with human agreement: mean LLM-human agreement increased from 42.6% for classifications with <80% stability to 82.7% for 100% stability (Pearson r=0.36). Full-cohort extraction showed frequent and overlapping documented factors: sleep disruption was most frequently detected (57.7%), followed by peer conflict (47.2%), academic issues (43.4%), disciplinary issues (43.3%), and attendance issues (16.9%).
Discussion
Agreement varied by construct and was strongest when repeated model outputs were stable.
Conclusion
Locally hosted open-weight LLMs can support scalable structured extraction of documented psychosocial factors from pediatric psychiatric intake notes after local validation.