Beyond word error rate: Multidimensional evaluation of ASR performance for digital speech biomarkers

Thayabaran Kathiresan
Zona Ling
Loretta Gasparini
Stephanie Eliza Piccolo
Clara Tjiandra
Hana Elhammamy
Amy Brodtmann
Adam P. Vogel

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Automatic speech recognition (ASR) is increasingly used in digital health, yet its reliability for populations with atypical speech, such as people with dementia, is not well characterised beyond word error rate (WER). We evaluated eight ASR systems on speech from adults with dementia and healthy older adults using WER, part-of-speech sequence agreement (POS-sqWER), part-of-speech distribution mismatch (POS-MDE), sentence-embedding cosine distance, and stutter detection error (SDE). Mixed-effects models with Tukey-corrected contrasts were used for comparison. Performance was consistently poorer for dementia speech across all metrics. AWS and CrisperWhisper showed relatively strong lexical, semantic, and syntactic fidelity, whereas Google and Meta exhibited lower accuracy. Other systems showed intermediate performance with rankings varying by metric. POS-MDE revealed syntactic distortions not captured by WER or POS-sqWER. SDE performance was low across systems. Multidimensional evaluation reveals clinically relevant linguistic distortions obscured by WER alone, supporting the need for population- and task-specific ASR validation in digital health research.

Version published to 10.21203/rs.3.rs-8824529/v1 on Research Square
Feb 17, 2026

Voice Stress Markers Are Orthogonal to Speech Disfluency Labels: A Large-Scale Analysis on SEP-28K

This article has 1 author:
1. Nazar Kozak
This article has no evaluationsLatest version Apr 7, 2026
Severity-Dependent Speech Characteristics and Clear Speech Response in Parkinson’s Disease: Perceptual, Acoustic, and Lingual Kinematic Findings

This article has 3 authors:
1. Austin Thompson
2. Lifeng Lin
3. Yunjung Kim
This article has no evaluationsLatest version Mar 25, 2026
Speech Perception Consistency Facilitates Initial Lexical Activation, but Not Speech Perception Flexibility

This article has 3 authors:
1. Brian W. L. Wong
2. Arthur G. Samuel
3. Efthymia C Kapnoula
This article has no evaluationsLatest version Apr 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Voice Stress Markers Are Orthogonal to Speech Disfluency Labels: A Large-Scale Analysis on SEP-28K

Severity-Dependent Speech Characteristics and Clear Speech Response in Parkinson’s Disease: Perceptual, Acoustic, and Lingual Kinematic Findings

Speech Perception Consistency Facilitates Initial Lexical Activation, but Not Speech Perception Flexibility