Evaluating Voice-Enabled Generative AI for Mental Health: Real-Time Performance and Safety Analyses
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study investigates the integration of Voice AI into a locally hosted generative AI chatbot designed to function as a mental health assistant, with the goal of enabling intuitive, voice-based therapeutic interaction. Leveraging the Llama3.1 8B language model for privacy-preserving generation, the system combines Deepgram’s Speech-to-Text API and OpenAI’s Text-to-Speech API within a WebRTC-based framework to support low-latency, bi-directional communication. A custom pipeline facilitates real-time voice input and output, aiming to reduce barriers to engagement and foster a more natural conversational flow. Technical evaluation focuses on latency across short, long-form, and multi-turn dialogues, revealing response times within tolerable bounds for synchronous use. Prompt engineering and system prompt customization guide empathetic, context-aware responses in standard therapeutic scenarios, though limitations persist in handling edge cases. These findings suggest that locally hosted voice-enabled LLMs can support responsive, privacy-conscious mental health applications, with future work directed toward fine-tuning for high-risk interactions.