A Scoping Review of Generative AI in Mental Health Support
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Millions are using generative large language models (LLMs) for mental health support in a de-facto, unregulated public health intervention, and many LLM applications are being developed to support mental health. Amid reports of both therapeutic benefit and harm, the scientific evidence supporting these applications remains inconclusive.Following PRISMA-ScR guidelines, we conducted a scoping review of PubMed, Web of Science, ACM Digital Library, IEEE Xplore, and Google Scholar from June 2017 to July 2025. We included peer-reviewed, empirical studies of transformer-based LLMs used to deliver, augment, or analyze mental health support. We extracted data on sample composition, study design, the adoption of responsible evaluation practices, and model and dataset choices.We identified 132 studies. Of 36 client-facing studies with human participants, most were small (median n = 42), uncontrolled (26/36), and recruited participants without a diagnosed mental disorder (35/36) using user experience metrics (23/36) rather than clinical outcomes (12/36). Responsible evaluation practices were generally not implemented, including safety protocols for risk detection (18%) and potentially harmful content (16%); real-world implementation was rarely addressed (7%). Across the two highest-quality controlled clinical studies, effect sizes ranged from d = .44 to .90 in symptom reduction.This gap between widespread public adoption and the limited evidence base stresses the need for robust methodological standards, including rigorous clinical trials, a focus on safety and implementation, and standardized, clinically meaningful benchmarks. Stronger evidence is needed to document that generative AI systems can safely and meaningfully improve access to mental health treatment.