Never Mind the Repeat: How Speech Expectations Reduce Tracking at the Cocktail Party

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

When the brain focuses on a conversation in a noisy environment, it exploits past experience to prioritize relevant elements from the auditory scene. This prompts the question of what changes occur in the selective neural processing of speech mixtures as listeners garner prior experience about single speech objects. In three different priming experiments, we quantified cortical selection of temporal landmarks from continuous speech, applying the temporal response function (TRF) method to single-trial electroencephalography (EEG) recordings. The designs specifically addressed how attention interacts with exact (Experiment 1), voice (Experiment 2a), or message (Experiment 2b) content priming of the target or background speakers in cortical responses to speech. Our results demonstrate that, during multispeaker listening, attentional gains typical of cortical responses under speech selection are met with attenuations as a consequence of prior experience. The changes were observed at the P2 processing stage (220-320 ms) of speech envelope onset processing and were specific to responses to primed speech targets (Experiment 1). Suppressions at stages earlier than the P2, or under partial priming conditions (Experiments 2a and 2b), were not observed. An exploratory analysis suggests the observed P2 reduction predicts listeners’ ability to report target words, consistent with this component encoding in part temporal prediction error about onset edge cues exclusive to target speech. Our results show that at this late and definitive stage of selective attention, the auditory system may test the evidence for its own predictive model of the noise-invariant speech stream. Precise inference of its temporal structure is bound to tag all checkpoints where auditory evidence can be most reliably connected into higher-order representations of continuous speech.

Article activity feed