Prediction from Statistical Learning Aids Auditory Scene Analysis

Vibha Viswanathan
Srinidhi Narayanan
Ingrid S. Johnsrude
Jenny R. Saffran
Barbara G. Shinn-Cunningham

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Statistical regularities support auditory scene analysis across multiple levels. While acoustic regularities like comodulation and harmonicity aid bottom-up perceptual grouping, higher-level regularities like linguistic or musical structure must be learned to form a mental “schema” of statistical patterns. Although learned schemas may benefit comprehension by helping listeners perceptually separate and/or attend to a target sound stream in an acoustic mixture, the underlying mechanisms are unclear. Here, we used a statistical learning paradigm to expose listeners to sequences of speech syllables with fixed transitional probabilities, forming an artificial “language” of trisyllabic words. Following exposure, participants attended to one of two concurrent syllable streams and detected target syllables. Detection performance improved when the attended stream conformed to the statistical structure learned implicitly during exposure, with a larger benefit in the presence of a competing stream than in quiet. In contrast, predictability of the unattended stream had no effect on performance. Electroencephalography revealed that predictable targets elicited earlier parietal P300 “target-recognition” responses and enhanced neural tracking of the attended stream, with additional signatures of predictive processing observed even in the absence of targets. These findings demonstrate that learned statistical regularities enhance listening in noise by enabling predictive, schema-based selection of relevant input. Rather than facilitating automatic segregation of competing sounds, learned lexical schemas support auditory scene analysis through attentional template matching. Our findings establish a direct mechanistic link to the role of prediction in schema-based listening in noise.

Significance

Our remarkable ability to isolate a target sound source, such as a person’s voice, in noisy environments is essential for effective communication. This process—termed auditory scene analysis—is known to rely on low-level acoustic regularities, but it is unclear whether learned higher-level regularities, like linguistic structure, also contribute. Combined electroencephalography and behavioral experiments reveal that statistical prediction of upcoming target syllables based on learned syllable-transition probabilities of an artificial language improves attentional selection to a target sound stream in an acoustic mixture. Prediction enhances neural tracking of the attended stream and speeds neural recognition of auditory targets. These findings have implications for auditory training approaches to rehabilitate hearing-impaired individuals who struggle to understand speech in noise.

Version published to 10.64898/2026.04.21.719938 on bioRxiv
Apr 23, 2026

Effects of bimodal divided attention on cortical representations of linguistic context during continuous speech perception in noise

This article has 1 author:
1. Zilong Xie
This article has no evaluationsLatest version May 1, 2026
Human-like sequential sound-to-meaning transfer drives artificial speech comprehension

This article has 7 authors:
1. Shenshen Zhang
2. Siqi Li
3. Ruolin Yang
4. Guanpeng Chen
5. Xing Tian
6. Qian Wang
7. Fang Fang
This article has no evaluationsLatest version May 15, 2026
Modeling the Influence of Bandwidth and Envelope on Categorical Loudness Scaling

This article has 5 authors:
1. Stephen T. Neely
2. Sara E. Harris
3. Joshua J. Hajicek
4. Erik A. Petersen
5. Yi Shen
This article has no evaluationsLatest version Apr 1, 2026

Discuss this preprint

Listed in

Abstract

Significance

Article activity feed

Related articles

Effects of bimodal divided attention on cortical representations of linguistic context during continuous speech perception in noise

Human-like sequential sound-to-meaning transfer drives artificial speech comprehension

Modeling the Influence of Bandwidth and Envelope on Categorical Loudness Scaling