Analysis: Serving Individuals with Language Impairments using Automatic Speech Recognition Models and Large Language Models: Challenges and Opportunities

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language models (LLMs) have attracted much attention for healthcare applications, demonstrating strong potential in automating conversational interactions. However, cloud-hosted LLMs pose major data privacy concerns when processing Protected Health Information. Moreover, current LLM-based systems rely on text input/output, creating substantial barriers for users, such as children and older adults, who may have difficulty typing. To mitigate these challenges, there has been growing interest in developing edge device-based, voice-enabled LLM systems. Running LLMs on edge devices minimizes the risks of PHI leaking to the cloud, while automatic speech recognition (ASR) eliminates the need for text-based inputs. Despite these advantages, existing ASRs convert speech into word-by-word text, which often contains disfluencies and fillers (e.g., ”um”, ”hum”) and grammatical errors, especially for individuals with language impairments. This noisy input can significantly degrade the performance of LLMs, yet this chained issue remains under-explored in healthcare applications. To address this critical gap, we conducted a systematic analysis through comparison studies and ablation experiments to identify key factors affecting the performance of edge-based ASR-LLM systems when used by individuals with language impairments. Furthermore, we proposed an evaluation framework for speech-enabled AI healthcare to emphasize both interpretability and robustness, paving the way for more inclusive and secure conversational healthcare solutions.

Article activity feed