Automated speech content analysis to detect depression with large language models: towards multilingual and few-shot capabilities
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large Language Models (LLMs) offer potential solutions for scalable depression detection across diverse populations. This study evaluates LLM-based speech content analysis for multilingual depression detection in clinical and general populations. We analyzed speech transcripts from three distinct cohorts: Chinese clinical (n = 52), Italian clinical (n = 116), and French general population (n = 1,347). Our LLM-based system, using state-of-the-art open source LLM-model with few-shot prompting, was compared against traditional audio embedding and text embedding approaches for detecting depression and secondary symptoms (anxiety, insomnia, fatigue). The LLM system achieved excellent depression detection with F1-scores of 0.96 (Chinese), 0.85 (Italian), and 0.40 (French), consistently outperforming baseline methods. Depression sensitivity reached 1.00 (Chinese) and 0.93 (French), with high specificity in clinical populations (0.93 Chinese, 0.88 Italian). For secondary symptoms, anxiety detection performed well with high sensitivity (0.85 Chinese, 0.97 French) and F1-scores of 0.78 (Chinese) and 0.31 (French), while performance varied for other symptoms with fatigue detection performing at near-random levels. Statistical analysis revealed language-dependent benefits from few-shot learning, with Chinese datasets particularly benefiting from additional examples when using larger models. Our findings demonstrated that LLM-based speech analysis provides robust multilingual capabilities for depression detection without requiring language-specific training data, offering a scalable solution for mental health screening across diverse populations.