Prediction from Statistical Learning Aids Auditory Scene Analysis

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Statistical regularities support auditory scene analysis across multiple levels. While acoustic regularities like comodulation and harmonicity aid bottom-up perceptual grouping, higher-level regularities like linguistic or musical structure must be learned to form a mental “schema” of statistical patterns. Although learned schemas may benefit comprehension by helping listeners perceptually separate and/or attend to a target sound stream in an acoustic mixture, the underlying mechanisms are unclear. Here, we used a statistical learning paradigm to expose listeners to sequences of speech syllables with fixed transitional probabilities, forming an artificial “language” of trisyllabic words. Following exposure, participants attended to one of two concurrent syllable streams and detected target syllables. Detection performance improved when the attended stream conformed to the statistical structure learned implicitly during exposure, with a larger benefit in the presence of a competing stream than in quiet. In contrast, predictability of the unattended stream had no effect on performance. Electroencephalography revealed that predictable targets elicited earlier parietal P300 “target-recognition” responses and enhanced neural tracking of the attended stream, with additional signatures of predictive processing observed even in the absence of targets. These findings demonstrate that learned statistical regularities enhance listening in noise by enabling predictive, schema-based selection of relevant input. Rather than facilitating automatic segregation of competing sounds, learned lexical schemas support auditory scene analysis through attentional template matching. Our findings establish a direct mechanistic link to the role of prediction in schema-based listening in noise.

Significance

Our remarkable ability to isolate a target sound source, such as a person’s voice, in noisy environments is essential for effective communication. This process—termed auditory scene analysis—is known to rely on low-level acoustic regularities, but it is unclear whether learned higher-level regularities, like linguistic structure, also contribute. Combined electroencephalography and behavioral experiments reveal that statistical prediction of upcoming target syllables based on learned syllable-transition probabilities of an artificial language improves attentional selection to a target sound stream in an acoustic mixture. Prediction enhances neural tracking of the attended stream and speeds neural recognition of auditory targets. These findings have implications for auditory training approaches to rehabilitate hearing-impaired individuals who struggle to understand speech in noise.

Article activity feed