Evaluating Voice-Enabled Generative AI for Mental Health: Real-Time Performance and Safety Analyses

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study investigates the integration of Voice AI into a locally hosted generative AI chatbot designed to function as a mental health assistant, with the goal of enabling intuitive, voice-based therapeutic interaction. Leveraging the Llama3.1 8B language model for privacy-preserving generation, the system combines Deepgram’s Speech-to-Text API and OpenAI’s Text-to-Speech API within a WebRTC-based framework to support low-latency, bi-directional communication. A custom pipeline facilitates real-time voice input and output, aiming to reduce barriers to engagement and foster a more natural conversational flow. Technical evaluation focuses on latency across short, long-form, and multi-turn dialogues, revealing response times within tolerable bounds for synchronous use. Prompt engineering and system prompt customization guide empathetic, context-aware responses in standard therapeutic scenarios, though limitations persist in handling edge cases. These findings suggest that locally hosted voice-enabled LLMs can support responsive, privacy-conscious mental health applications, with future work directed toward fine-tuning for high-risk interactions.

Article activity feed