Language-Based Detection of Depression with Machine Learning: Systematic Review and Meta- Analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Early detection of depression is critical for timely intervention. Natural language processing (NLP) and machine learning (ML) approaches have increasingly been used to automatically detect depression from text data, yet comprehensive evidence regarding their diagnostic performance remains limited. We systematically reviewed and meta-analyzed studies applying NLP and ML to identify depression from spoken or written language. Six electronic databases and additional sources were searched, yielding 892 full-text articles, of which 123 met inclusion criteria. One representative result per dataset was selected for quantitative synthesis, resulting in 50 independent studies. Pooled accuracy across studies (k = 43; n = 40,983) was 0.80 (95% CI, 0.76–0.83). Precision (k = 28) was 0.78 (95% CI, 0.72–0.83), recall (k = 33) 0.76 (95% CI, 0.68–0.83), AUC (k = 14) 0.79 (95% CI, 0.70–0.85), and balanced accuracy (k = 16) 0.71 (95% CI, 0.63–0.78). Subgroup analyses showed significant differences by language, text source, feature type, and classifier (all p < .001). Accuracy was highest in studies using structured clinical interviews, non-English languages, and linguistic or embedding-based features. However, in one-at-a-time meta-regressions, only text source remained a significant predictor (QM(3) = 8.78, p = .032), explaining 13.6% of the between-study variance. Publication bias was minimal. Automated depression detection from text shows promising performance with substantial heterogeneity. Performance varies by language, data source, feature extraction, and model type. Findings highlight both current limitations and potential of text-based depression detection and underscore the need for methodological standardization and validation before clinical use.