Large language models for depression assessment from brief daily diaries

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Importance: Self-report measures have limitations for intensive longitudinal assessment of depression. Scoring depression from brief natural language samples with large language models (LLMs) may be a solution to these limitations, but this method has not yet been rigorously evaluated.Objective: Test convergent, construct, and incremental validity of depression ratings made by LLMs from daily video diaries. Design: Cross-sectional study conducted from 2016-2018, including a baseline clinical interview and 14-21 days of daily self-report surveys and video diaries. Setting: Participants were recruited from a clinical research registry and the community via flyers.Participants: Volunteer sample selected for a range of psychopathology and mental health treatment status (45% currently in outpatient treatment).Main outcomes and measures: Daily depression scores estimated by LLMs from video diary transcripts; daily depression, affect, stress exposure rated by self-reports; depression severity assessed by clinical interview.Results: Among the 108 participants included in the analysis, 53% were female and the mean age was 28 (SD = 6.1). LLM ratings of depression from open-ended narratives converged with self-reports of depression within-person, across days (r = .45; 95% CI, .40-.49) and when averaged across days for an overall depression score (r = .61; 95% CI, .43-.75). Averaged LLM depression ratings also correlated with major depressive disorder symptom severity ascertained by clinical interview (r = .53; 95% CI, .35, .68). Daily depression rated by LLM and self-report had similar profiles of associations with daily functioning variables (profile r = .98 within-person; r = .94 between-person). Multivariable regression results showed LLM depression ratings associated with daily functioning variables (partial βs = |.12| - |.28|) and interview-based depression ratings (partial β = .33; 95% CI, .07-.58) over and above self-reports.Conclusions and relevance: Results support validity of LLMs to measure day-to-day changes in depression and overall levels of depression from brief, open-ended daily audio diaries. These findings establish strong foundation for a low-burden depression assessment approach that satisfies the unique demands of intensive longitudinal monitoring and addresses critical shortcomings of self-report surveys.

Article activity feed