Performance Analysis of Speech Recognition Models in Automated Scoring of the QuickSIN Test
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Purpose
Best practices in audiology recommend assessing speech understanding in noisy environments, especially for those with communication difficulties. Speech-in-noise (SiN) assessments such as the QuickSIN are used for validating signal processing in hearing aids (HAs) and are linked to HA satisfaction. This project seeks to enhance QuickSIN test efficiency by applying recent advancements in automatic speech recognition (ASR) technologies.
Method
Twenty-three adults with sensorineural hearing loss were fitted bilaterally with Unitron Moxi HAs and were administered the QuickSIN test in low and high reverberation environments. Testing was performed with two different HA programs: an omnidirectional program and a fixed directional microphone program. QuickSIN sentences were presented from 0° azimuth and competing babble from either 0°, laterally from 90° or 270°, or simultaneously from 90°, 180°, and 270° azimuths. Participants’ verbal responses to QuickSIN stimuli were scored by an audiologist and were recorded in parallel for offline transcription and scoring by ASR models from Amazon, Microsoft, NVIDIA, and Picovoice. The ASR-derived QuickSIN scores were compared to the corresponding audiologist-derived scores.
Results
Repeated Measures ANOVA results revealed that all ASR models overestimated the QuickSIN scores across most test conditions. Bland-Altman analyses showed that the Amazon ASR model had the least bias and the narrowest range for the limits of agreement, in comparison to the manual scoring by an experienced audiologist.
Conclusions
Some ASR models, such as Amazon, demonstrated performance comparable to that of an audiologist in automatically scoring QuickSIN tests. However, further refinements are necessary to increase the robustness of the ASR models in scoring low SNR loss test conditions.