Large language models for depression assessment from brief daily diaries
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Importance: Self-report measures have limitations for intensive longitudinal assessment of depression. Scoring depression from brief natural language samples with large language models (LLMs) may be a solution to these limitations, but this method has not yet been rigorously evaluated.Objective: Test convergent, construct, and incremental validity of depression ratings made by LLMs from daily video diaries. Design: Cross-sectional study conducted from 2016-2018, including a baseline clinical interview and 14-21 days of daily self-report surveys and video diaries. Setting: Participants were recruited from a clinical research registry and the community via flyers.Participants: Volunteer sample selected for a range of psychopathology and mental health treatment status (45% currently in outpatient treatment).Main outcomes and measures: Daily depression scores estimated by LLMs from video diary transcripts; daily depression, affect, stress exposure rated by self-reports; depression severity assessed by clinical interview.Results: Among the 108 participants included in the analysis, 53% were female and the mean age was 28 (SD = 6.1). LLM ratings of depression from open-ended narratives converged with self-reports of depression within-person, across days (r = .45; 95% CI, .40-.49) and when averaged across days for an overall depression score (r = .61; 95% CI, .43-.75). Averaged LLM depression ratings also correlated with major depressive disorder symptom severity ascertained by clinical interview (r = .53; 95% CI, .35, .68). Daily depression rated by LLM and self-report had similar profiles of associations with daily functioning variables (profile r = .98 within-person; r = .94 between-person). Multivariable regression results showed LLM depression ratings associated with daily functioning variables (partial βs = |.12| - |.28|) and interview-based depression ratings (partial β = .33; 95% CI, .07-.58) over and above self-reports.Conclusions and relevance: Results support validity of LLMs to measure day-to-day changes in depression and overall levels of depression from brief, open-ended daily audio diaries. These findings establish strong foundation for a low-burden depression assessment approach that satisfies the unique demands of intensive longitudinal monitoring and addresses critical shortcomings of self-report surveys.