Analysis: Serving Individuals with Language Impairments using Automatic Speech Recognition Models and Large Language Models: Challenges and Opportunities

Yiyu Shi
Ruiyang Qin
Haoxinran Yu
Lixuan Wei
Yuxuan Liu
Dancheng Liu
Chenhui Xu
Jiajie Li
Gelei Xu
Ahmed Abbasi
Jinjun Xiong
Xiufan Yu
Zhi Zheng

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) have attracted much attention for healthcare applications, demonstrating strong potential in automating conversational interactions. However, cloud-hosted LLMs pose major data privacy concerns when processing Protected Health Information. Moreover, current LLM-based systems rely on text input/output, creating substantial barriers for users, such as children and older adults, who may have difficulty typing. To mitigate these challenges, there has been growing interest in developing edge device-based, voice-enabled LLM systems. Running LLMs on edge devices minimizes the risks of PHI leaking to the cloud, while automatic speech recognition (ASR) eliminates the need for text-based inputs. Despite these advantages, existing ASRs convert speech into word-by-word text, which often contains disfluencies and fillers (e.g., ”um”, ”hum”) and grammatical errors, especially for individuals with language impairments. This noisy input can significantly degrade the performance of LLMs, yet this chained issue remains under-explored in healthcare applications. To address this critical gap, we conducted a systematic analysis through comparison studies and ablation experiments to identify key factors affecting the performance of edge-based ASR-LLM systems when used by individuals with language impairments. Furthermore, we proposed an evaluation framework for speech-enabled AI healthcare to emphasize both interpretability and robustness, paving the way for more inclusive and secure conversational healthcare solutions.

Version published to 10.21203/rs.3.rs-7069967/v1 on Research Square
Jul 24, 2025

Acoustic-Driven Generation of Pathological Speech Reports Using Large Language Models

This article has 9 authors:
1. Tomas Arias-Vergara
2. Lukas Buess
3. Nastassia Vysotskaya
4. Soroosh Tayebi Arasteh
5. Juan Rafael Orozco-Arroyave
6. Maria Schuster
7. Elmar Noeth
8. Andreas Maier
9. Paula Andrea Perez-Toro
This article has no evaluationsLatest version Aug 19, 2025
Accents Still Confuse AI: Systematic Errors in Speech Transcription and LLM-Based Remedies

This article has 4 authors:
1. Yasaman Fatapour
2. Jamil S. Samaan
3. Inclusive AI Research Group
4. Nicholas P Tatonetti
This article has no evaluationsLatest version Sep 2, 2025
Performance and biases of the LENA® and ACLEW algorithms in analyzing language environments in Down, Fragile X, Angelman syndromes, and populations at elevated likelihood for autism

This article has 4 authors:
1. Marvin Lavechin
2. Lisa R. Hamrick
3. Bridgette Kelleher
4. Amanda Seidl
This article has no evaluationsLatest version Jul 30, 2025

Listed in

Abstract

Article activity feed

Related articles

Acoustic-Driven Generation of Pathological Speech Reports Using Large Language Models

Accents Still Confuse AI: Systematic Errors in Speech Transcription and LLM-Based Remedies

Performance and biases of the LENA® and ACLEW algorithms in analyzing language environments in Down, Fragile X, Angelman syndromes, and populations at elevated likelihood for autism