Safety Mechanisms and Risk Mitigation in Generative AI Mental Health Chatbots: A Systematic Scoping Review

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Generative AI (GenAI) mental health chatbots are increasingly being developed to help address persistent barriers to mental healthcare. Unlike earlier rule-based and retrieval-based systems, GenAI chatbots generate open-ended outputs that can be inaccurate and unsafe. Documented harms from general-purpose GenAI chatbots have highlighted the need for purpose-built interventions with dedicated safeguards, yet how safety is implemented in such interventions remains poorly understood. Methods: This scoping review followed the Joanna Briggs Institute methodology and PRISMA-ScR guidelines, with a prospectively registered and peer-reviewed protocol. A systematic search of seven academic databases and search engines including MEDLINE, Scopus, PsycINFO, ACM Digital Library, IEEE Xplore, Google Scholar and Consensus was conducted in July 2025. Two reviewers independently screened records and extracted data. Safety mechanisms and risk mitigation strategies were narratively synthesised across three pre-specified domains: technical safeguards, pre-deployment safety considerations, and delivery-phase risk mitigation strategies. Results: Twenty-one studies across 11 countries were included. Most interventions incorporated at least one technical safety mechanism, most commonly fine-tuning and prompt engineering. A smaller subset implemented layered safety architectures combining retrieval systems, content filters or risk classifiers, and rule-based algorithms. Pre-deployment safeguards included clinical expert and user co-design approaches, research ethics procedures, and data privacy measures. During intervention delivery, detailed onboarding with role clarification was common, but human oversight was limited. Crisis referral protocols varied in rigor but were mostly underdeveloped, and systematic adverse event monitoring was sparse. Documented safety failures included missed suicidal ideation and provision of inaccurate clinical information. Conclusion: GenAI chatbot interventions require a robust sociotechnical approach that integrates technical safeguards with user co-design, procedural controls, and human oversight. Future research is needed to evaluate efficacy, improve safeguards, and standardize safety outcome measurement. Regulatory oversight proportional to the risks these systems carry is required to enable integration into stepped or blended mental healthcare.

Article activity feed