Safety Mechanisms and Risk Mitigation in Generative AI Mental Health Chatbots: A Systematic Scoping Review

Lotenna Olisaeloka
Chris Richardson
Angel Wang
Richard Munthali
Daniel Vigo

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Generative AI (GenAI) mental health chatbots are increasingly being developed to help address persistent barriers to mental healthcare. Unlike earlier rule-based and retrieval-based systems, GenAI chatbots generate open-ended outputs that can be inaccurate and unsafe. Documented harms from general-purpose GenAI chatbots have highlighted the need for purpose-built interventions with dedicated safeguards, yet how safety is implemented in such interventions remains poorly understood. Methods: This scoping review followed the Joanna Briggs Institute methodology and PRISMA-ScR guidelines, with a prospectively registered and peer-reviewed protocol. A systematic search of seven academic databases and search engines including MEDLINE, Scopus, PsycINFO, ACM Digital Library, IEEE Xplore, Google Scholar and Consensus was conducted in July 2025. Two reviewers independently screened records and extracted data. Safety mechanisms and risk mitigation strategies were narratively synthesised across three pre-specified domains: technical safeguards, pre-deployment safety considerations, and delivery-phase risk mitigation strategies. Results: Twenty-one studies across 11 countries were included. Most interventions incorporated at least one technical safety mechanism, most commonly fine-tuning and prompt engineering. A smaller subset implemented layered safety architectures combining retrieval systems, content filters or risk classifiers, and rule-based algorithms. Pre-deployment safeguards included clinical expert and user co-design approaches, research ethics procedures, and data privacy measures. During intervention delivery, detailed onboarding with role clarification was common, but human oversight was limited. Crisis referral protocols varied in rigor but were mostly underdeveloped, and systematic adverse event monitoring was sparse. Documented safety failures included missed suicidal ideation and provision of inaccurate clinical information. Conclusion: GenAI chatbot interventions require a robust sociotechnical approach that integrates technical safeguards with user co-design, procedural controls, and human oversight. Future research is needed to evaluate efficacy, improve safeguards, and standardize safety outcome measurement. Regulatory oversight proportional to the risks these systems carry is required to enable integration into stepped or blended mental healthcare.

Version published to 10.31234/osf.io/g8q5v_v1 on OSF Preprints
Apr 6, 2026

Understanding Smart Behavioral AI in Infectious Disease Prevention: A Review of Usability, Equity, and Local Adaptation

This article has 1 author:
1. Awadalla Abdelwahid
This article has no evaluationsLatest version Mar 25, 2026
Evidence of Impact and Interpretational Limits of Generative AI in STEM education - A Systematic Review and Meta-Analysis on Cognitive Learning Outcomes

This article has 5 authors:
1. Stefan Küchemann
2. Chiara Hortmann
3. Nina Peltzer
4. Salome Flegr
5. Jochen Kuhn
This article has no evaluationsLatest version Apr 8, 2026
Navigating Ethical Complexities in Educational AI: A Systematic Review of Generative Chatbot Integration in Teaching and Learning

This article has 1 author:
1. thabo mhlongo
This article has no evaluationsLatest version Apr 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Understanding Smart Behavioral AI in Infectious Disease Prevention: A Review of Usability, Equity, and Local Adaptation

Evidence of Impact and Interpretational Limits of Generative AI in STEM education - A Systematic Review and Meta-Analysis on Cognitive Learning Outcomes

Navigating Ethical Complexities in Educational AI: A Systematic Review of Generative Chatbot Integration in Teaching and Learning