Automated Severity and Breathiness Assessment of Disordered Speech Using a Speech Foundation Model

Vahid Ashkanichenarlogh
Arman Hassanpour
Vijay Parsa

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In this study, we proposed a novel automated speech quality estimation model capable of evaluating perceptual dysphonia severity and breathiness in audio samples, ensuring alignment with expert-rated assessments. The proposed model integrates Whisper ASR embeddings with Mel spectrograms augmented by second-order delta features combined with a sequential-attention fusion network feature mapping path. This hybrid approach enhances the model’s sensitivity to phonetic, high level feature representation and spectral variations, enabling more accurate predictions of perceptual speech quality. A sequential-attention fusion network feature mapping module captures long-range de-pendencies through the multi-head attention network, while LSTM layers refine the learned representations by modeling temporal dynamics. Comparative analysis against state-of-the-art methods for dysphonia assessment demonstrates our model’s superior generalization across test samples. Our findings underscore the effectiveness of ASR-derived embeddings alongside the deep feature mapping structure in speech quality assessment, offering a promising pathway for advancing automated evaluation systems.

Version published to 10.20944/preprints202510.0389.v1
Oct 6, 2025

Measuring Reliability in Locally-deployed Language Model Dysarthric Speech Assessments

This article has 4 authors:
1. Ondrej Klempir
2. Juliana Grand Mullerova
3. Ales Tichopad
4. Radim Krupicka
This article has no evaluationsLatest version Oct 27, 2025
A Review of Deep Learning for Speech Recognition and Its Application in Advanced Hearing Assistance for the Elderly

This article has 3 authors:
1. Cheng Yao Song
2. Zhen Bin It
3. Jovan Bowen Heng
This article has no evaluationsLatest version Oct 23, 2025
High-Fidelity Neural Speech Reconstruction through an Efficient Acoustic-Linguistic Dual-Pathway Framework

This article has 5 authors:
1. Jiawei Li
2. Chunxu Guo
3. Chao Zhang
4. Edward F. Chang
5. Yuanning Li
This article has no evaluationsLatest version Sep 25, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Measuring Reliability in Locally-deployed Language Model Dysarthric Speech Assessments

A Review of Deep Learning for Speech Recognition and Its Application in Advanced Hearing Assistance for the Elderly

High-Fidelity Neural Speech Reconstruction through an Efficient Acoustic-Linguistic Dual-Pathway Framework