Modulation statistics allow robust prediction of speech recognition accuracy across many words, voices, and natural background sounds

Alex C. Clonan
Ian H. Stevenson
Monty A. Escabí

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Although humans excel at speech recognition, recognition accuracy can vary widely due to differences in background environments as well as the speaker’s voice quality, intonation, and pitch. Predicting when speech recognition will succeed or fail, however, remains an ongoing challenge in hearing research. Here we characterize recognition abilities across a wide range of natural conditions using digits spoken by many male and female talkers of multiple ages with 33 unique backgrounds. Across this diverse set of sounds, speech recognition is most strongly influenced by the spectrum and modulation statistics of the noise. Yet, articulatory features of the speech, including fundamental and formant frequencies, show categorically distinct modulatory effects on accuracy across age, gender, and words. We then show that a low-dimensional model of sound, based on computations in the auditory midbrain, accounts for participants’ single-trial recognition behavior across voices, words and backgrounds. Thus, speech-in-noise perception across extremely diverse natural conditions depends largely on a simple set of spectrotemporal statistics likely encoded by central neural populations.

Version published to 10.64898/2026.04.27.721224 on bioRxiv
Apr 30, 2026

Modeling the Influence of Bandwidth and Envelope on Categorical Loudness Scaling

This article has 5 authors:
1. Stephen T. Neely
2. Sara E. Harris
3. Joshua J. Hajicek
4. Erik A. Petersen
5. Yi Shen
This article has no evaluationsLatest version Apr 1, 2026
A model-based analysis of modulation masking effects on vocoded speech intelligibility

This article has 4 authors:
1. Cathrina Veigel
2. Helia Relaño-Iborra
3. Andrew Oxenham
4. Torsten Dau
This article has no evaluationsLatest version Mar 28, 2026
Listeners store category identity and uncertainty in memory during spoken word recognition, but not acoustic detail

This article has 1 author:
1. Wednesday Bushong
This article has no evaluationsLatest version Apr 17, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Modeling the Influence of Bandwidth and Envelope on Categorical Loudness Scaling

A model-based analysis of modulation masking effects on vocoded speech intelligibility

Listeners store category identity and uncertainty in memory during spoken word recognition, but not acoustic detail