Recurrent neural networks as neuro-computational models of human speech recognition

Christian Brodbeck
Thomas Hannagan
James S. Magnuson

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Human speech recognition transforms a continuous acoustic signal into categorical linguistic units, by aggregating information that is distributed in time. It has been suggested that this kind of information processing may be understood through the computations of a Recurrent Neural Network (RNN) that receives input frame by frame, linearly in time, but builds an incremental representation of this input through a continually evolving internal state. While RNNs can simulate several key behavioral observations about human speech and language processing, it is unknown whether RNNs also develop computational dynamics that resemble human neural speech processing . Here we show that the internal dynamics of long short-term memory (LSTM) RNNs, trained to recognize speech from auditory spectrograms, predict human neural population responses to the same stimuli, beyond predictions from auditory features. Variations in the RNN architecture motivated by cognitive principles further improved this predictive power. Specifically, modifications that allow more human-like phonetic competition also led to more human-like temporal dynamics. Overall, our results suggest that RNNs provide plausible computational models of the cortical processes supporting human speech recognition.

Version published to 10.1371/journal.pcbi.1013244
Jul 28, 2025
Version published to 10.1101/2024.02.20.580731 on bioRxiv
Feb 22, 2024

Rhythm modulates perception and neural tracking of speech in a speech-in-noise task

This article has 4 authors:
1. Eloise Schell
2. Tzu-Han Zoe Cheng
3. Yi Shen
4. T. Christina Zhao
This article has no evaluationsLatest version Jan 6, 2026
Rhythm modulates perception and neural tracking of speech in a speech-in-noise task

This article has 4 authors:
1. Eloise Schell
2. Tzu-Han Zoe Cheng
3. Yi Shen
4. T. Christina Zhao
This article has no evaluationsLatest version Jan 6, 2026
Embodied speech: Sensorimotor contributions to native and non-native phoneme processing and learning

This article has 5 authors:
1. Tzuyi Tseng
2. Jennifer Krzonowski
3. Claudio Brozzoli
4. Alice C. Roy
5. Véronique Boulenger
This article has no evaluationsLatest version Jan 26, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Rhythm modulates perception and neural tracking of speech in a speech-in-noise task

Rhythm modulates perception and neural tracking of speech in a speech-in-noise task

Embodied speech: Sensorimotor contributions to native and non-native phoneme processing and learning