Linguistic processing of task-irrelevant speech at a Cocktail Party

Paz Har-shai Yahav
Elana Zion Golumbic

Curated by eLife

Summary:

This study addresses a current and important question: how deeply are "ignored" speech stimuli processed? By imposing a regular rhythm on the to-be-ignored speech and analyzing MEG responses in the frequency domain, you were able to show an increase in power at the phrasal level (1 Hz) of irrelevant speech when It contained structured (linguistic) content, but not at the word level (2 Hz) or the sentence level (0.5 Hz). This finding supports the idea that cortical activity represents syntactic information about the unattended speech. Source analysis shows that the task-irrelevant speech is processed at the sentence level in the left fronto-temporal area and posterior parietal cortex, and in a manner very different from the acoustical encoding of syllables. Though the study is intriguing and well designed, there are some issues that must be addressed to back up the claims of the paper.

Reviewer #1 and Reviewer #2 opted to reveal their name to the authors in the decision letter after review.

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (eLife)

Abstract

Paying attention to one speaker in noisy environments can be extremely difficult, because to-be-attended and task-irrelevant speech compete for processing resources. We tested whether this competition is restricted to acoustic-phonetic interference or if it extends to competition for linguistic processing as well. Neural activity was recorded using Magnetoencephalography as human participants were instructed to attended to natural speech presented to one ear, and task-irrelevant stimuli were presented to the other. Task-irrelevant stimuli consisted either of random sequences of syllables, or syllables structured to form coherent sentences, using hierarchical frequency-tagging.

We find that the phrasal structure of structured task-irrelevant stimuli was represented in the neural response in left inferior frontal and posterior parietal regions, indicating that selective attention does not fully eliminate linguistic processing of task-irrelevant speech. Additionally, neural tracking of to-be-attended speech in left inferior frontal regions was enhanced when competing with structured task-irrelevant stimuli, suggesting inherent competition between them for linguistic processing.

Impact Statement

Syntactic structure-building processes can be applied to speech that is task-irrelevant and should be ignored, demonstrating that Selective Attention does not fully eliminate linguistic processing of competing speech.

eLife
Mar 11, 2021
Reviewer #3:

The use of frequency tagging to analyze continuous processing at phonemic, word, phrasal and sentence-levels offers a unique insight into neural locking at higher-levels. While the approach is novel, there are major concerns regarding the technical details and interpretation of results to support phrase-level responses to structured speech distractors.

Major concerns:
1. Is the peak at 1Hz real and can it be attributed solely to the structured distractor?
  
  The study did not comment on the spectral profile of the "attended" speech, and how much low modulation energy is actually attributed to the prosodic structure of attended sentences? To what extent does the interplay of the attended utterance and distractor shape the modulation dynamics of the stimulus (even dichotically)?
  
  How is the ITPC normalized? Figure 2 speaks of a …
Reviewer #3:

The use of frequency tagging to analyze continuous processing at phonemic, word, phrasal and sentence-levels offers a unique insight into neural locking at higher-levels. While the approach is novel, there are major concerns regarding the technical details and interpretation of results to support phrase-level responses to structured speech distractors.

Major concerns:

Is the peak at 1Hz real and can it be attributed solely to the structured distractor?

The study did not comment on the spectral profile of the "attended" speech, and how much low modulation energy is actually attributed to the prosodic structure of attended sentences? To what extent does the interplay of the attended utterance and distractor shape the modulation dynamics of the stimulus (even dichotically)?

How is the ITPC normalized? Figure 2 speaks of a normalization but it is not clear how? The peak at 1Hz appears extremely weak and no more significant (visually) than other peaks - say around 3Hz and also 2.5Hz in the case of non-structured speech? Can the authors report on the regions in modulation space that showed any significant deviations? What about the effect size of the 1Hz peak relative to these other regions?

It is hard to understand where the noise floor in this analysis - this floor will rotate with the permutation test analysis performed in the analysis of the ITPC and may not be fully accounted for. This issue depends on what the chosen normalization procedure is. The same interpretation put forth by the author regarding a lack of a 0.5Hz peak due to noise still raises the question of interpreting the observed 1Hz peak?

Control of attention during task performance

The authors present a very elegant analysis of possible alternative accounts of the results, but they acknowledge that possible attention switches, even if irregular, could result in accumulated information that could emerge as a small neurally-locked response at the phrase-level? As indicated by the authors, the entire experimental design to fully control for such switches is a real feat. That being said, additional analyses could shed some light on variations of attentional state and their effect on observed results. For instance, analysis of behavioral data across different trials (wouldn't be conclusive, but could be informative)

This issue is further compounded by the fact that a rather similar study (Ding et al.) did not report any phrasal-level processing, though there are design differences. The authors suggest differences in attentional load as a possible explanation and provide a very appealing account or reinterpretation of the literature based on a continuous model of processing based on task demands. While theoretically interesting, it is not clear whether any of the current data supports such an account. Again, maybe a correlation between neural responses and behavioral performance in specific trials could shed some light or strengthen this claim.

Additional comments:

What is the statistic shown for the behavioral results? Is this for the multiple choice question? Then what is the t-test on?

Beyond inter-trial phase coherence, can the authors comment on actual power-locked responses at the same corresponding rates?
Read the original source
eLife
Mar 11, 2021

Reviewer #2:

This paper by Har-shai Yahav and Zion Golumbic investigates the coding of higher level linguistic information in task-irrelevant speech. The experiment uses a clever design, where the task-irrelevant speech is structured hierarchically so that the syllable, word, and sentence levels can be ascertained separately in the frequency domain. This is then contrasted with a scrambled condition. The to-be-attended speech is naturally uttered and the response is analyzed using the temporal response function. The authors report that the task-irrelevant speech is processed at the sentence level in the left fronto-temporal area and posterior parietal cortex, in a manner very different from the acoustical encoding of syllables. They also find that the to-be-attended speech responses are smaller when the distractor speech is not …

Reviewer #2:

This paper by Har-shai Yahav and Zion Golumbic investigates the coding of higher level linguistic information in task-irrelevant speech. The experiment uses a clever design, where the task-irrelevant speech is structured hierarchically so that the syllable, word, and sentence levels can be ascertained separately in the frequency domain. This is then contrasted with a scrambled condition. The to-be-attended speech is naturally uttered and the response is analyzed using the temporal response function. The authors report that the task-irrelevant speech is processed at the sentence level in the left fronto-temporal area and posterior parietal cortex, in a manner very different from the acoustical encoding of syllables. They also find that the to-be-attended speech responses are smaller when the distractor speech is not scrambled, and that this difference shows up in exactly the same fronto-temporal area--a very cool result.

This is a great paper. It is exceptionally well written from start to finish. The experimental design is clever, and the results were analyzed with great care and are clearly described.

The only issue I had with the results is that the possibility (or likelihood, in my estimation) that the subjects are occasionally letting their attention drift to the task-irrelevant speech rather than processing in parallel can't be rejected. To be fair, the authors include a nice discussion of this very issue and are careful with the language around task-relevance and attended/unattended stimuli. It is indeed tough to pull apart. The second paragraph on page 18 states "if attention shifts occur irregularly, the emergence of a phase-rate peak in the neural response would indicate that bits of 'glimpsed' information are integrated over a prolonged period of time." I agree with the math behind this, but I think it would only take occasional lapses lasting 2 or 3 seconds to get the observed results, and I don't consider that "prolonged." It is, however, much longer than a word, so nicely rejects the idea of single-word intrusions.

Read the original source
eLife
Mar 11, 2021

Reviewer #1:

The present study sought to better characterize how listeners deal with competing speech streams from multiple talkers, that is, whether unattended speech in a multitalker environment competes for exclusively lower-level acoustic/phonetic resources or whether it competes for higher-level linguistic processing resources as well. The authors recorded MEG data and used hierarchical frequency tagging in an unattended speech stream presented to one ear while listeners were instructed to attend to stories presented in the other ear. The study found that when the irrelevant speech contained structured (linguistic) content, an increase in power at the phrasal level (1 Hz) was observed, but not at the word level (2 Hz) or the sentence level (0.5 Hz). This suggests that some syntactic information in the unattended speech stream is …

Reviewer #1:

The present study sought to better characterize how listeners deal with competing speech streams from multiple talkers, that is, whether unattended speech in a multitalker environment competes for exclusively lower-level acoustic/phonetic resources or whether it competes for higher-level linguistic processing resources as well. The authors recorded MEG data and used hierarchical frequency tagging in an unattended speech stream presented to one ear while listeners were instructed to attend to stories presented in the other ear. The study found that when the irrelevant speech contained structured (linguistic) content, an increase in power at the phrasal level (1 Hz) was observed, but not at the word level (2 Hz) or the sentence level (0.5 Hz). This suggests that some syntactic information in the unattended speech stream is represented in cortical activity, and that there may be a disconnect between lexical (word level) processing and syntactic processing. Source analyses of the difference between conditions indicated activity in left inferior frontal and left posterior parietal cortices. Analysis of the source activity underlying the linear transformation of the stimulus and response revealed activation in the left inferior frontal (and nearby) cortex. Implications for the underlying mechanisms (whether attentional shift or parallel processing) are discussed. The results have important implications for the debate on the type and amount of representation that occurs to unattended speech streams.

The authors utilize clever tools which arguably provided a unique means to address the main research question, i.e., they used hierarchical frequency tagging for the distractor speech, which allowed them to assess linguistic representations at different levels (syllable-, word-, phrase-, and sentence-level). This technique enabled the authors to make claims about what level of language hierarchy the stimuli are being processed, depending on the observed frequency modulation in neural activity. These stimuli were presented during MEG recording, which let the authors assess changes in neurophysiological processing in near real time--essential for research on spoken language. Source analyses of these data provided information on the potential neural mechanisms involved in this processing. The authors also assessed a temporal response function (TRF) based on the speech envelope to determine the brain regions involved at these different levels for linguistic analysis of the distractor speech.

Critiques:

Speech manipulation:

In general, it is unclear what predictions to make regarding the frequency tagging of the unattended distractor speech. On the one hand, the imposed artificial rhythmicity (necessary for the frequency tagging approach) may make it easier for listeners to ignore the speech stream, and thus seeing an effect at higher-level frequency tags may be of greater note, although not entirely plausible. On the other hand, having the syllables presented at a consistent rate may make it easier for listeners to parse words and phrasal units because they know precisely when in time a word/phrase/sentence boundary is going to occur, allowing listeners to check on the irrelevant speech stream at predictable times. For both the frequency tagging and TRF electrophysiological results, the task-irrelevant structured speech enhancement could be interpreted as an infiltration of this information in the neural signal (as the authors suggest), but because the behavioral results are not different this latter interpretation is not easily supported. This pattern of results is difficult to interpret.

Behavioral Results:

Importantly, no behavioral difference in accuracy was observed between the two irrelevant speech conditions (structured vs. non-structured), which makes it difficult to interpret what impact the structured irrelevant speech had on attentive listening. If the structured speech truly "infiltrates" or "competes" for linguistic processing resources, the reader would assume a decrease in task accuracy in the structured condition. This behavioral pattern has been observed in other studies. This calls into questions the face validity of the stimuli and task being used.

Attention:

In this study activation of posterior parietal cortex was found, that could be indicative of a strong attentional manipulation, and that the task was in fact quite attentionally demanding in order for subjects to perform. This may align with the lack of behavioral difference between structured and non-structured irrelevant stimuli. Perhaps subjects attempted to divide their attention which may have been possible between speech that was natural and speech that was rather artificial. The current results may align with a recent proposal that inferior frontal activity may be distinguished by language selective and domain general patterns.

Lack of word level response:

A major concern is that the results do not seem to replicate from an earlier study with the same structured stimuli, i.e., the effects were seen for sentence and word level frequency tagging. As the authors discuss, it seems difficult to understand how a phrasal level of effect could be obtained without word-level processing, and so a response at the word level is expected.

Familiarization phase:

The study included a phase of familiarization with the stimuli, to get participants to understand the artificial speech. However it would seem that it is much easier for listeners to report back on structured rather than unstructured stimuli. This is relevant to understanding any potential differences between the two conditions. It is unclear if any quantification was made of performance/understanding at this phase. If there is no difference in the familiarization phase, this might explain why there was no difference in behavior during the actual task between the two conditions. Or, if there is a difference at the familiarization phase (i.e. structured sequences are more easily repeated back than non-structured sequences), this might help explain the neural data result at 1 Hz, given that some higher level of processing must have occurred for the structured speech (such as "chunking" into words/phrasal units).

Read the original source
eLife
Mar 11, 2021

Summary:

This study addresses a current and important question: how deeply are "ignored" speech stimuli processed? By imposing a regular rhythm on the to-be-ignored speech and analyzing MEG responses in the frequency domain, you were able to show an increase in power at the phrasal level (1 Hz) of irrelevant speech when It contained structured (linguistic) content, but not at the word level (2 Hz) or the sentence level (0.5 Hz). This finding supports the idea that cortical activity represents syntactic information about the unattended speech. Source analysis shows that the task-irrelevant speech is processed at the sentence level in the left fronto-temporal area and posterior parietal cortex, and in a manner very different from the acoustical encoding of syllables. Though the study is intriguing and well designed, there are some issues …

Summary:

This study addresses a current and important question: how deeply are "ignored" speech stimuli processed? By imposing a regular rhythm on the to-be-ignored speech and analyzing MEG responses in the frequency domain, you were able to show an increase in power at the phrasal level (1 Hz) of irrelevant speech when It contained structured (linguistic) content, but not at the word level (2 Hz) or the sentence level (0.5 Hz). This finding supports the idea that cortical activity represents syntactic information about the unattended speech. Source analysis shows that the task-irrelevant speech is processed at the sentence level in the left fronto-temporal area and posterior parietal cortex, and in a manner very different from the acoustical encoding of syllables. Though the study is intriguing and well designed, there are some issues that must be addressed to back up the claims of the paper.

Reviewer #1 and Reviewer #2 opted to reveal their name to the authors in the decision letter after review.

Read the original source
Version published to 10.1101/2020.11.08.373746 on bioRxiv
Nov 9, 2020

Rhythm modulates perception and neural tracking of speech in a speech-in-noise task

This article has 4 authors:
1. Eloise Schell
2. Tzu-Han Zoe Cheng
3. Yi Shen
4. T. Christina Zhao
This article has no evaluationsLatest version Jan 6, 2026
Rhythm modulates perception and neural tracking of speech in a speech-in-noise task

This article has 4 authors:
1. Eloise Schell
2. Tzu-Han Zoe Cheng
3. Yi Shen
4. T. Christina Zhao
This article has no evaluationsLatest version Jan 6, 2026
What did you said? Differential impacts of acoustic challenge on semantic, syntactic, and prediction-related ERPs during speech processing

This article has 5 authors:
1. Jack Silcox
2. Karen Bennett
3. David Strayer
4. Sarah Hargus Ferguson
5. Brennan Payne
This article has no evaluationsLatest version Jan 28, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Impact Statement

Article activity feed

Related articles

Rhythm modulates perception and neural tracking of speech in a speech-in-noise task

Rhythm modulates perception and neural tracking of speech in a speech-in-noise task

What did you said? Differential impacts of acoustic challenge on semantic, syntactic, and prediction-related ERPs during speech processing