Neural tracking of phrases in spoken language comprehension is automatic and task-dependent
Curation statements for this article:-
Curated by eLife
Evaluation Summary:
This paper will be of interest to researchers studying how spoken language is processed in the brain. The results add to our understanding of how brain oscillations track language information at the syllable, word, and sentence level. The analyses are thoughtful and the key claims of the manuscript are largely supported by the data, although some conclusions may require additional support.
(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (eLife)
- Neuroscience (eLife)
Abstract
Linguistic phrases are tracked in sentences even though there is no one-to-one acoustic phrase marker in the physical signal. This phenomenon suggests an automatic tracking of abstract linguistic structure that is endogenously generated by the brain. However, all studies investigating linguistic tracking compare conditions where either relevant information at linguistic timescales is available, or where this information is absent altogether (e.g., sentences versus word lists during passive listening). It is therefore unclear whether tracking at phrasal timescales is related to the content of language, or rather, results as a consequence of attending to the timescales that happen to match behaviourally relevant information. To investigate this question, we presented participants with sentences and word lists while recording their brain activity with magnetoencephalography (MEG). Participants performed passive, syllable, word, and word-combination tasks corresponding to attending to four different rates: one they would naturally attend to, syllable-rates, word-rates, and phrasal-rates, respectively. We replicated overall findings of stronger phrasal-rate tracking measured with mutual information for sentences compared to word lists across the classical language network. However, in the inferior frontal gyrus (IFG) we found a task effect suggesting stronger phrasal-rate tracking during the word-combination task independent of the presence of linguistic structure, as well as stronger delta-band connectivity during this task. These results suggest that extracting linguistic information at phrasal rates occurs automatically with or without the presence of an additional task, but also that IFG might be important for temporal integration across various perceptual domains.
Article activity feed
-
-
Author Response
Reviewer #1 (Public Review):
In this study, the authors set out to clarify the relationship between brain oscillations and different levels of speech (syllables, words, phrases) using MEG. They presented word lists and sentences and used task instructions to attempt to focus listeners' attention on different levels of linguistic analysis (syllables, words, phrases).
- I came away with mixed feelings about the task design: following each stimulus (sentence or word list), participants were asked to (a) press a button (i.e. nothing related to what they heard, (b) indicate which of two syllables was heard, (c) indicate which of two words was heard, (d) indicate which pair of words was present in the correct order. This task is the critical manipulation in the study, as it is intended to encourage (or in the authors' …
Author Response
Reviewer #1 (Public Review):
In this study, the authors set out to clarify the relationship between brain oscillations and different levels of speech (syllables, words, phrases) using MEG. They presented word lists and sentences and used task instructions to attempt to focus listeners' attention on different levels of linguistic analysis (syllables, words, phrases).
- I came away with mixed feelings about the task design: following each stimulus (sentence or word list), participants were asked to (a) press a button (i.e. nothing related to what they heard, (b) indicate which of two syllables was heard, (c) indicate which of two words was heard, (d) indicate which pair of words was present in the correct order. This task is the critical manipulation in the study, as it is intended to encourage (or in the authors' words, "require") participants to focus on different timescales of speech (syllable, word, and phrase, respectively). I very much like the idea of keeping the physical stimuli unchanged, and manipulating attention through task demands - an elegant and effective approach. At the same time, I have reservations about the degree to which these task instructions altered attention during listening. My intuition is that, if I were a participant, I would just listen attentively, and then answer the question about the specific level. For example, I don't know that knowing I would be doing a "word pair" task, I would be attending at a slower rate than a "word" task, as in both cases I would be motivated to understand all of the words in the sentence. I fully acknowledge my introspection (n=1) may be flawed here, but nevertheless, any additional support validating the effect of these instructions would help the interpretation of the MEG results.
The reviewer points out that to do any task on sentences (such as a word task and a syllable task) participants’ strategy could be to understand the full meaning of the sentence and infer the lower level properties based on the understanding of the full sentence. We fully share this introspection, which would suggest that extracting sentence meaning is partly automatic (or at least a default mode of processing) and independent of the behavioral relevance. While the reviewer sees this as a downside of the design, this is part of what our study tried to disentangle (automatic versus task-dependent processing at lower frequency time-scales). If, as the reviewer points out, all processing of sentences would be automatic we should not find any effect of task (as the task should not affect the tracking response at all). We found that overall the tracking response is robust to task-induced manipulation of attention – the main effect that MI to phrases is higher for sentences than for word lists is robust across passive and task conditions. But that is not the whole story on the source level, where we do find some task effects, which indicates that task instructions do matter. This means that participants changed their strategy depending on the instructions, but that overall, tracking of linguistic structures such as phrases is automatic. We show that for the IFG MI phrasal time scales are tracked stronger during the phrase task versus the other tasks. This is also reflected in stronger STG-IFG connectivity during the phrasal versus passive task. These results speak against the interpretation of the reviewer that “task instructions“ do not “ altered attention during listening”. While there are these subtle task differences, especially in IFG, overall our findings do speak for an automatic tracking of phrasal rate structure in sentences independent of task. We therefore concluded that “automatic understanding of linguistic information, and all the processing that this entails, cannot be countered to substantially change the consequences for neural readout, even when explicitly instructing participants to pay attention to particular time-scales” (line 548-549).
The analysis steps generally seem sensible and well-suited to answering the main claims of the study. Controlling for power differences between conditions through matching was a nice feature.
- I had a concern about accuracy differences (as seen in Figure 1) across stimulus materials and tasks. In particular, for the phrase task, participants were more accurate for sentence stimuli than word list stimuli. I think this makes a lot of sense, as a coherent sentence will be easier to remember in order than a list of words. But, I did not see accuracy taken into account in any of the analyses. These behavioral differences raise the possibility that the MEG results related to the sentence > word list contrast in phrases (which seems one of the most interesting findings in IFG) simply reflect differences in accuracy.
With the caveat of the concern regarding accuracy differences, the research goals were clear and the conclusions were generally supported by the analyses.
Thank you for pointing this out. We have now taken accuracy into account in our analysis. It did not change any of our main findings or conclusions, and strengthened the argument that tracking of phrases in sentences vs. word lists is stronger. The influence of task difficulty is a relevant point to investigate (also see point 1 of reviewer 2 and point 4 of reviewer 3). To do so we added accuracy (per participant per condition) as a factor in the mixed model (as well as all interactions with task and condition) for the MI, power, and connectivity analyses at the phrasal rate/delta band. Note that as for the passive task there is no accuracy, we removed the passive task from the analyses. We could also only run models with random intercepts (not random slopes), due to the reduced number of degrees of freedom when adding the factor accuracy to the models.
For the MI analysis we only found an effect in MTG. Specifically, there was a three-way interaction between task, condition and accuracy (F(2, 91.9) = 3.4591, p = 0.036). To follow up on this three-way interaction we split the data per task. The condition*accuracy interaction was only (uncorrected) significant for the word combination task (F(1,24.8) = 5.296, p = 0.03 (uncorrected)) and not for any other task (p>0.1). In the word combination task, we found that the difference between sentences and word lists was the strongest at high accuracies (see below figure the predicted values of the model). One way to interpret this finding is that stronger phrasal-rate MI tracking in MTG promotes phrasal-rate processing (as indicated by accuracy) more in sentences than in word lists.
MEG – behavioral performance relation. A) Predicted values for the phrasal band MI in the MTG for the word combination task separately for the two conditions. B) Predicted values for the delta band WPLI in the STG-MTG connection separately for the two conditions. Error bars indicate the 95% confidence interval of the fit. Colored lines at the bottom indicate individual datapoints.
For power we did not find any effect of accuracy. For the connectivity analysis we found in the STG-MTG connectivity a significant conditionaccuracy interaction (F(1, 80.23)=5.19, p = 0.025). The conditionaccuracy interaction showed that lower accuracies were generally associated with stronger differences between the sentences and word lists (see figure; the opposite of the MI analysis). Thus, functional connections in the delta band are stronger during sentence processing when participants have difficulty with the task (independent of the task performed). This could indicate that low-frequency connections are more relevant for the sentence than the word list condition (as the reviewer also indicated in point 1).
After correcting for accuracy there was also a significant task condition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005 corrected), but not for the other tasks (p>0.1).
We added the results of the accuracy analyses in the main manuscript as well as adding a dedicated section in our discussion section (page 21-22). Adding accuracy did not remove any of the effects we report in the original analyses. Therefore, none of these finding change the interpretation of the results as the task still had an influence on the MI responses of MTG and IFG. The effect of accuracy in the MTG refined the results showing that the effect was strongest there for participants with high accuracies. This relationship suggests a functional role of tracking through phase alignment for understanding phrasal structure.
The methods now read: “MEG-behavioural performance analysis: To investigate the relation between the MEG measures and the behavioural performance we repeated the analyses (MI, power, and connectivity) but added accuracy as a factor (together with the interactions with the task and condition factor). As there is no accuracy for the passive task, we removed this task from the analysis. We then followed the same analyse steps as before. Since we reduced our degree of freedom, we could however only create random intercept and not random slope models”.
The results now read: “MEG-behavioural performance relation. We found for the MI analysis a significant effect of accuracy only in the MTG. Here, we found a three-way interaction between accuracy task condition (F(2, 91.9) = 3.459, p = 0.036). Splitting up for the three different tasks we found only an uncorrected significant effect for the condition accuracy interaction for the phrasal task (F(1, 24.8) = 5.296, p = 0.03) and not for the other two tasks (p>0.1). In the phrasal task, we found that when accuracy was high, there was a stronger difference between the sentence and the word list condition compared to when accuracy was low, with stronger accuracy for the sentence condition (Figure 7A).
No relation between accuracy and power was found. For the connectivity analysis we found a significant condition accuracy interaction for the STG-MTG connection (F(1,80.23) = 5.19, p = 0.025; Figure 7B). Independent of task, when accuracy was low the difference between sentence and word lists was stronger with higher WPLI fits for the sentence condition. After correcting for accuracy there was also a significant task condition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005), but not for the other tasks (p>0.1).”
The discussion now reads: “We found that across participants both the MI and the connectivity in temporal cortex influenced behavioural performance. Specifically, MTG-STG connections were, independent of task, related to accuracy. There was higher connectivity between MTG and STG for sentences compared to word lists at low accuracies. At high accuracies, we found that stronger MTG tracking at phrasal rates (measured with MI) for sentences compared to word lists during the word combination task. These results suggest that indeed tracking of phrasal structure in MTG is relevant to understand sentences compared to word lists. This was reflected in a general increase in delta connectivity differences when the task was difficult (Figure 7B). Participants might compensate for the difficulty using phrasal structure present in the sentence condition. When phrasal structure in sentences are accurately tracked (as measured with MI) performance is better when these rates are relevant (Figure 7A). These results point to a role for phrasal tracking for accurately understanding the higher order linguistic structure in sentences even though more research is needed to verify this. It is evident that the connectivity and tracking correlations to behaviour do not explain all variation in the behavioural performance (compare Figure 1 with 3). Plainly, temporal tracking does not explain everything in language processing. Besides tracking there are many other components important for our designated tasks, such as memory load and semantic context which are not captured by our current analyses.”
Reviewer #2 (Public Review):
In a MEG study, the authors investigate as their main question whether neural tracking at the phrasal time scale reflects linguistic structure building (testing different conditions: sentences vs. word-lists) or an attentional focus on the phrasal time scale (testing different tasks, passive listening, syllable task, word task, word combination/phrasal scale task). They perform the following analyses at brain areas (ROIs: STG, IFG, MTG) of the language network: (1) Mutual information (MI) between the acoustic envelope and the delta band neuronal signals is analyzed. (2) Power in the delta band is analyzed. (3) Connectivity is analyzed using debiased WPLI. For all analyses, linear mixed-models are separately conducted for each ROI. The main finding is that the sentence compared to the word-list condition is more strongly tracked at the phrasal scale (MI). In STG the effect was task-independent; in MTG the effect only occurred for active tasks; and in IFG additionally, the word-combining/phrasal scale task resulted in higher tracking compared to all other tasks. The authors conclude that phrasal scale neural tracking reflects linguistic processing which takes place automatically, while task-related attention contributes additionally at IFG (interpreted as combinatorial hub involved in language and non-language processing). The findings are stable when power differences are controlled. The connectivity analysis showed increased connectivity in the delta band (phrasal time scale) between IFG-STG in the phrasal-scale compared to the passive task (adding to the IFG MI findings). (Additionally, they separately analyze neural tracking at the syllabic and word time scale, which however is not in the main focus).
Major strength/weaknesses of the methods and results:
- A major strength of the results is that part of them replicate the authors' earlier findings (i.e. higher tracking at the phrasal time scale for sentences compared to word-lists; Kaufeld et al., 2020), while they complement this earlier work by showing that the effects are due to linguistic processing and not to an attentional focus on the phrasal time scale due to the task (at least in STG and MTG; while the task plays a role for the IFG tracking). Another strength is that a power control analysis is applied, which allows excluding spurious results due to condition differences in power. A weakness of the method is that analyses were applied separately per ROI, and combined across correct/incorrect trials (if I understood correctly), no trial-based analysis was conducted (which is related to how MI is computed). Furthermore, several methodological details could be clarified in the manuscript.
The authors achieved their aims by providing evidence that neuronal tracking at the phrasal time scale in STG and MTG depends on the presence of linguistic information at this scale rather than indicating an attentional focus on this time scale due to a specific task. Their results support the conclusion. Results would be strengthened by showing that these effects are not impacted by different amounts of correct/incorrect trials across conditions (if I understood that correctly).
We thank the reviewer for her comments. It is correct that we collapsed across the correct and incorrect trials. This had various reasons (also see point 2 and 9 of reviewer 1 and point 4 of reviewer 3). First, our tasks function solely to direct participants’ attention to the various linguistic representations (syllables, words, phrases) and the timescales that they occur on. The three tasks are in a sense more control tasks to study the tracking response, and manipulate attention as tracking during spoken language comprehension occurs, rather than a case where the neural response to the tasks is itself to be studied. For example, in a typical working memory paradigm, it is only during correct trials that the relevant cognitive process occurs. In contrast, in our paradigm, it is likely that that spoken stimuli are heard and processing, in other words, sentence comprehension and word list perception occur, even during incorrect trials in the syllable condition. As such, we do not expect MI tracking responses to explain the behavioral data. However, we agree it is crucially important to show that MI differences are not a function of task performance differences.
Second, there are clear differences in difficulty level of the trials within conditions. For example, if the target question was related to the last part of the audio fragment, the task was much easier than when it was at the beginning of the audio fragment. In the syllable task, if syllables also were (by chance) a part-word, the trial was also much easier. If we were to split up in correct and incorrect we would not really infer solely processes due to accurately processing the speech fragments, but also confounded the analysis by the individual difficulty level of the trials.
To acknowledge this, we added this limitation to the methods. The methods now reads: “Note that different trials within a task were not matched for task difficulty. For example, in the syllable task syllables that make a word are much easier to recognize than syllables that do not make a word. Additionally, trials pertaining to the beginning of the sentence are more difficult than ones related to the end of the sentence due to recency effects.”.
To still investigate if overall accuracy influenced the results we did add accuracy (across participants) to the mixed models. Note that as for the passive task there is no accuracy, we removed the passive task from the analyses. We could also only run models with random intercepts (not random slopes), due to the reduced number of degrees of freedom when adding the factor accuracy to the models.
For the MI analysis we only found an effect in MTG. Specifically, there was a three-way interaction between task, condition and accuracy (F(2, 91.9) = 3.4591, p = 0.036). To follow up on this three-way interaction we split the data per task. The condition*accuracy interaction was only (uncorrected) significant for the word combination task (F(1,24.8) = 5.296, p = 0.03 (uncorrected)) and not for any other task (p>0.1). In the word combination task, we found that the difference between sentences and word lists was the strongest at high accuracies (see on the right attached figure the predicted values of the model). One way to interpret this finding is that stronger phrasal-rate MI tracking in MTG promotes phrasal-rate processing (as indicated by accuracy) more in sentences than in word lists.
For power we did not find any effect of accuracy. For the connectivity analysis we found in the STG-MTG connectivity a significant conditionaccuracy interaction (F(1, 80.23)=5.19, p = 0.025). The conditionaccuracy interaction showed that lower accuracies were generally associated with stronger differences between the sentences and word lists (see figure below; the opposite of the MI analysis). Thus, functional connections in the delta band are stronger during sentence processing when participants have difficulty with the task (independent of the task performed). This could indicate that low-frequency connections are more relevant for the sentence than the word list condition.
MEG – behavioral performance relation. A) Predicted values for the phrasal band MI in the MTG for the word combination task separately for the two conditions. B) Predicted values for the delta band WPLI in the STG-MTG connection separately for the two conditions. Error bars indicate the 95% confidence interval of the fit. Colored lines at the bottom indicate individual datapoints.
After correcting for accuracy there was also a significant task*condition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005 corrected), but not for the other tasks (p>0.1).
We added the results of the accuracy analyses in the main manuscript as well as adding a dedicated section in our discussion section (page 21-22). Adding accuracy did not remove any of the effects we report in the original analyses. Therefore, none of these finding change the interpretation of the results as the task still had an influence on the MI responses of MTG and IFG. The effect of accuracy in the MTG refined the results showing that the effect was strongest there for participants with high accuracies. This relationship suggests a functional role of tracking through phase alignment for understanding phrasal structure.
The methods now read: “MEG-behavioural performance analysis: To investigate the relation between the MEG measures and the behavioural performance we repeated the analyses (MI, power, and connectivity) but added accuracy as a factor (together with the interactions with the task and condition factor). As there is no accuracy for the passive task, we removed this task from the analysis. We then followed the same analyse steps as before. Since we reduced our degree of freedom, we could however only create random intercept and not random slope models”.
The results now read: “MEG-behavioural performance relation. We found for the MI analysis a significant effect of accuracy only in the MTG. Here, we found a three-way interaction between accuracytaskcondition (F(2, 91.9) = 3.459, p = 0.036). Splitting up for the three different tasks we found only an uncorrected significant effect for the condition*accuracy interaction for the phrasal task (F(1, 24.8) = 5.296, p = 0.03) and not for the other two tasks (p>0.1). In the phrasal task, we found that when accuracy was high, there was a stronger difference between the sentence and the word list condition compared to when accuracy was low, with stronger accuracy for the sentence condition (Figure 7A).
No relation between accuracy and power was found. For the connectivity analysis we found a significant conditionaccuracy interaction for the STG-MTG connection (F(1,80.23) = 5.19, p = 0.025; Figure 7B). Independent of task, when accuracy was low the difference between sentence and word lists was stronger with higher WPLI fits for the sentence condition. After correcting for accuracy there was also a significant taskcondition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005), but not for the other tasks (p>0.1).”
The discussion now reads: “We found that across participants both the MI and the connectivity in temporal cortex influenced behavioural performance. Specifically, MTG-STG connections were, independent of task, related to accuracy. There was higher connectivity between MTG and STG for sentences compared to word lists at low accuracies. At high accuracies, we found that stronger MTG tracking at phrasal rates (measured with MI) for sentences compared to word lists during the word combination task. These results suggest that indeed tracking of phrasal structure in MTG is relevant to understand sentences compared to word lists. This was reflected in a general increase in delta connectivity differences when the task was difficult (Figure 7B). Participants might compensate for the difficulty using phrasal structure present in the sentence condition. When phrasal structure in sentences are accurately tracked (as measured with MI) performance is better when these rates are relevant (Figure 7A). These results point to a role for phrasal tracking for accurately understanding the higher order linguistic structure in sentences even though more research is needed to verify this. It is evident that the connectivity and tracking correlations to behaviour do not explain all variation in the behavioural performance (compare Figure 1 with 3). Plainly, temporal tracking does not explain everything in language processing. Besides tracking there are many other components important for our designated tasks, such as memory load and semantic context which are not captured by our current analyses.”
The findings are an important contribution to the ongoing debate in the field whether neuronal tracking at the phrasal time scale indicates linguistic structure processing or more general processes (e.g. chunking).
Reviewer #3 (Public Review):
This manuscript presents a MEG study aiming to investigate whether neural tracking of phrasal timescales depends on automatic language processing or specific tasks related to temporal attention. The authors collected MEG data of 20 participants as they listened to naturally spoken sentences or word lists during four different tasks (passive listening vs. syllable task vs. word tasks vs. phrase task). Based on mutual information and Connectivity analysis, the authors found that (1) neural tracking at the phrasal band (0.8-1.1 Hz) was significantly stronger for the sentence condition compared to the word list condition across the classical language network, i.e., STG, MTG, and IFG; (2) neural tracking at the phrasal band was (at least tend significantly) stronger for phrase task than other tasks in the IFG; (3) the IFG-STG connectivity was increased in the delta-band for the phrase task. Ultimately, the authors concluded that neural tracking of phrasal timescales relied on both automatic language processing and specific tasks.
Overall, this study is trying to tackle an interesting question related to the contributing factors for neural tracking of linguistic structures. The study procedure and analyses are well executed, and the conclusions of this paper are mostly well supported by data. However, I do have several major concerns.
- The title of the manuscript uses the description "tracking of hierarchical linguistic structure". In general, hierarchical linguistic structures involve multiple linguistic units with different timescales, such as syllables, words, phrases, and sentences. In this study, however, the main analysis only focused on the phrasal band (0.8-1.1 Hz). It seemed that there was no significant stimulus- or task-effect on the word band or syllabic band (supplementary figures). Therefore, it is highly recommended that the authors modify the related descriptions, or explain why neural tracking of phrases can represent neural tracking of hierarchical linguistic structures in the current study.
We thank the reviewer for this comment. We meant to refer to the task manipulation directing attention to different levels of representation across the linguistic hierarchy. We have changed the title to “Neural tracking of phrases during spoken language comprehension is automatic and task-dependent.” We hope this resolves any inadvertent confusion we created. Furthermore, throughout the manuscript we ensure to talk about effect occurring for phrasal tracking at low frequency bands at not across any hierarchical linguistic structure. We agree that our findings cannot speak for any task-dependent effects along the hierarchy, only that at the phrasal level there is a difference between sentences and word lists.
- In Methods, the authors employed MI analyses on three frequency bands: 0.8-1.1 Hz for the phrasal band, 1.9-2.8 Hz for the word band, and 3.5-5.0 Hz for the syllabic band (line 191-192). As the timescales of linguistic units are various and overlapped in natural speech, I wonder how the authors define the boundaries of these frequency bands, and whether these bands are proper for the naturally spoken stimuli in the current study. These important details should be clarified.
The frequency bands of the MI analysis were based on the stimuli, or in other words, are data driven. They reflect the syllabic, word, and phrasal rates in our stimulus set (calculated in Kaufeld et al., 2020). They were calculated by annotating the sentences by syllables, words, and phrasal and converting the rate of the linguistic units to frequency ranges. The information has been added to the manuscript. We acknowledge that unlike our stimulus set in natural speech the boundaries of these bands can overlap and now also state this (“While in our stimulus set the boundaries of the linguistic levels did not overlap, in natural speech the brain has an even more difficult task as there is no one-to-one match between band and linguistic unit [26]”, line number 211-213).
- What is missing in the manuscript are the explanations of the correlation between behavioral performance and neural tracking. In Results, the behavioral performance shows significant differences across the active tasks (Figure 1), but the MI differences across the tasks are relatively weak in IFG (Figure 3). In addition, the behavioral performance only shows significant differences between the sentence and word list conditions during the phrasal task, but the MI differences between the conditions are significant in MTG during the syllabic, word, and phrasal tasks. Explanations for these inconsistent results are expected.
We answer this point together with point 4 below where we analyze the behavioral performance and the MEG responses.
- Since the behavioral performance of these active tasks is likely related to the temporal attention to relevant timescales of different linguistic units, I wonder whether there exist underlying neural correlates of behavioral performance (e.g., significant correlation between performance and mutual information). If so, it may be interesting and bring a new bright spot for the current study.
The influence of task difficulty is a relevant point to investigate (also see point 1 of reviewer 2 and point 4 of reviewer 3). To do so we added accuracy (per participant per condition) as a factor in the mixed model (as well as all interactions with task and condition) for the MI, power, and connectivity analyses at the phrasal rate/delta band. Note that as for the passive task there is no accuracy, we removed the passive task from the analyses. We could also only run models with random intercepts (not random slopes), due to the reduced number of degrees of freedom when adding the factor accuracy to the models.
For the MI analysis we only found an effect in MTG. Specifically, there was a three-way interaction between task, condition and accuracy (F(2, 91.9) = 3.4591, p = 0.036). To follow up on this three-way interaction we split the data per task. The condition*accuracy interaction was only (uncorrected) significant for the word combination task (F(1,24.8) = 5.296, p = 0.03 (uncorrected)) and not for any other task (p>0.1). In the word combination task, we found that the difference between sentences and word lists was the strongest at high accuracies (see the below figure the predicted values of the model). One way to interpret this finding is that stronger phrasal-rate MI tracking in MTG promotes phrasal-rate processing (as indicated by accuracy) more in sentences than in word lists.
MEG – behavioral performance relation. A) Predicted values for the phrasal band MI in the MTG for the word combination task separately for the two conditions. B) Predicted values for the delta band WPLI in the STG-MTG connection separately for the two conditions. Error bars indicate the 95% confidence interval of the fit. Colored lines at the bottom indicate individual datapoints.
For power we did not find any effect of accuracy. For the connectivity analysis we found in the STG-MTG connectivity a significant conditionaccuracy interaction (F(1, 80.23)=5.19, p = 0.025). The conditionaccuracy interaction showed that lower accuracies were generally associated with stronger differences between the sentences and word lists (see figure attached; the opposite of the MI analysis). Thus, functional connections in the delta band are stronger during sentence processing when participants have difficulty with the task (independent of the task performed). This could indicate that low-frequency connections are more relevant for the sentence than the word list condition.
After correcting for accuracy there was also a significant task*condition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005 corrected), but not for the other tasks (p>0.1).
We added the results of the accuracy analyses in the main manuscript as well as adding a dedicated section in our discussion section (page 21-22). Adding accuracy did not remove any of the effects we report in the original analyses. Therefore, none of these finding change the interpretation of the results as the task still had an influence on the MI responses of MTG and IFG. The effect of accuracy in the MTG refined the results showing that the effect was strongest there for participants with high accuracies. This relationship suggests a functional role of tracking through phase alignment for understanding phrasal structure.
While the findings can explain some behavioral effects, we agree with the reviewer that the behavioral results and the MI results don’t align. We note that our use of tasks to guide attention to different timescales and linguistic representations differs from the use of, for example, a working memory task where only the correct trials contain the relevant cognitive process. In working memory type paradigms, the MEG data should indeed explain the behavioral response. Our study was designed to test for effects of task demands on the neural tracking response to speech and language. As we are only using the tasks to control attention, we do not attempt to explain behavior through the MEG data or differences in MI.
Thus, the phrasal tracking cannot explain all of the behavioral results (point 3). It is at this point unclear what could have caused this effect, but it quite likely that neural sources outside the speech and language ROIs we selected are in play. We discuss this now.
The methods now read: “MEG-behavioural performance analysis: To investigate the relation between the MEG measures and the behavioural performance we repeated the analyses (MI, power, and connectivity) but added accuracy as a factor (together with the interactions with the task and condition factor). As there is no accuracy for the passive task, we removed this task from the analysis. We then followed the same analyse steps as before. Since we reduced our degree of freedom, we could however only create random intercept and not random slope models”.
The results now read: “MEG-behavioural performance relation. We found for the MI analysis a significant effect of accuracy only in the MTG. Here, we found a three-way interaction between accuracytaskcondition (F(2, 91.9) = 3.459, p = 0.036). Splitting up for the three different tasks we found only an uncorrected significant effect for the condition*accuracy interaction for the phrasal task (F(1, 24.8) = 5.296, p = 0.03) and not for the other two tasks (p>0.1). In the phrasal task, we found that when accuracy was high, there was a stronger difference between the sentence and the word list condition compared to when accuracy was low, with stronger accuracy for the sentence condition (Figure 7A).
No relation between accuracy and power was found. For the connectivity analysis we found a significant conditionaccuracy interaction for the STG-MTG connection (F(1,80.23) = 5.19, p = 0.025; Figure 7B). Independent of task, when accuracy was low the difference between sentence and word lists was stronger with higher WPLI fits for the sentence condition. After correcting for accuracy there was also a significant taskcondition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005), but not for the other tasks (p>0.1).”
The discussion now reads: “We found that across participants both the MI and the connectivity in temporal cortex influenced behavioural performance. Specifically, MTG-STG connections were, independent of task, related to accuracy. There was higher connectivity between MTG and STG for sentences compared to word lists at low accuracies. At high accuracies, we found that stronger MTG tracking at phrasal rates (measured with MI) for sentences compared to word lists during the word combination task. These results suggest that indeed tracking of phrasal structure in MTG is relevant to understand sentences compared to word lists. This was reflected in a general increase in delta connectivity differences when the task was difficult (Figure 7B). Participants might compensate for the difficulty using phrasal structure present in the sentence condition. When phrasal structure in sentences are accurately tracked (as measured with MI) performance is better when these rates are relevant (Figure 7A). These results point to a role for phrasal tracking for accurately understanding the higher order linguistic structure in sentences even though more research is needed to verify this. It is evident that the connectivity and tracking correlations to behaviour do not explain all variation in the behavioural performance (compare Figure 1 with 3). Plainly, temporal tracking does not explain everything in language processing. Besides tracking there are many other components important for our designated tasks, such as memory load and semantic context which are not captured by our current analyses.”
-
Evaluation Summary:
This paper will be of interest to researchers studying how spoken language is processed in the brain. The results add to our understanding of how brain oscillations track language information at the syllable, word, and sentence level. The analyses are thoughtful and the key claims of the manuscript are largely supported by the data, although some conclusions may require additional support.
(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)
-
Reviewer #1 (Public Review):
In this study, the authors set out to clarify the relationship between brain oscillations and different levels of speech (syllables, words, phrases) using MEG. They presented word lists and sentences and used task instructions to attempt to focus listeners' attention on different levels of linguistic analysis (syllables, words, phrases).
I came away with mixed feelings about the task design: following each stimulus (sentence or word list), participants were asked to (a) press a button (i.e. nothing related to what they heard, (b) indicate which of two syllables was heard, (c) indicate which of two words was heard, (d) indicate which pair of words was present in the correct order. This task is the critical manipulation in the study, as it is intended to encourage (or in the authors' words, "require") …
Reviewer #1 (Public Review):
In this study, the authors set out to clarify the relationship between brain oscillations and different levels of speech (syllables, words, phrases) using MEG. They presented word lists and sentences and used task instructions to attempt to focus listeners' attention on different levels of linguistic analysis (syllables, words, phrases).
I came away with mixed feelings about the task design: following each stimulus (sentence or word list), participants were asked to (a) press a button (i.e. nothing related to what they heard, (b) indicate which of two syllables was heard, (c) indicate which of two words was heard, (d) indicate which pair of words was present in the correct order. This task is the critical manipulation in the study, as it is intended to encourage (or in the authors' words, "require") participants to focus on different timescales of speech (syllable, word, and phrase, respectively). I very much like the idea of keeping the physical stimuli unchanged, and manipulating attention through task demands - an elegant and effective approach. At the same time, I have reservations about the degree to which these task instructions altered attention during listening. My intuition is that, if I were a participant, I would just listen attentively, and then answer the question about the specific level. For example, I don't know that knowing I would be doing a "word pair" task, I would be attending at a slower rate than a "word" task, as in both cases I would be motivated to understand all of the words in the sentence. I fully acknowledge my introspection (n=1) may be flawed here, but nevertheless, any additional support validating the effect of these instructions would help the interpretation of the MEG results.
The analysis steps generally seem sensible and well-suited to answering the main claims of the study. Controlling for power differences between conditions through matching was a nice feature.
I had a concern about accuracy differences (as seen in Figure 1) across stimulus materials and tasks. In particular, for the phrase task, participants were more accurate for sentence stimuli than word list stimuli. I think this makes a lot of sense, as a coherent sentence will be easier to remember in order than a list of words. But, I did not see accuracy taken into account in any of the analyses. These behavioral differences raise the possibility that the MEG results related to the sentence > word list contrast in phrases (which seems one of the most interesting findings in IFG) simply reflect differences in accuracy.
With the caveat of the concern regarding accuracy differences, the research goals were clear and the conclusions were generally supported by the analyses.
-
Reviewer #2 (Public Review):
In a MEG study, the authors investigate as their main question whether neural tracking at the phrasal time scale reflects linguistic structure building (testing different conditions: sentences vs. word-lists) or an attentional focus on the phrasal time scale (testing different tasks, passive listening, syllable task, word task, word combination/phrasal scale task). They perform the following analyses at brain areas (ROIs: STG, IFG, MTG) of the language network: (1) Mutual information (MI) between the acoustic envelope and the delta band neuronal signals is analyzed. (2) Power in the delta band is analyzed. (3) Connectivity is analyzed using debiased WPLI. For all analyses, linear mixed-models are separately conducted for each ROI. The main finding is that the sentence compared to the word-list condition is …
Reviewer #2 (Public Review):
In a MEG study, the authors investigate as their main question whether neural tracking at the phrasal time scale reflects linguistic structure building (testing different conditions: sentences vs. word-lists) or an attentional focus on the phrasal time scale (testing different tasks, passive listening, syllable task, word task, word combination/phrasal scale task). They perform the following analyses at brain areas (ROIs: STG, IFG, MTG) of the language network: (1) Mutual information (MI) between the acoustic envelope and the delta band neuronal signals is analyzed. (2) Power in the delta band is analyzed. (3) Connectivity is analyzed using debiased WPLI. For all analyses, linear mixed-models are separately conducted for each ROI. The main finding is that the sentence compared to the word-list condition is more strongly tracked at the phrasal scale (MI). In STG the effect was task-independent; in MTG the effect only occurred for active tasks; and in IFG additionally, the word-combining/phrasal scale task resulted in higher tracking compared to all other tasks. The authors conclude that phrasal scale neural tracking reflects linguistic processing which takes place automatically, while task-related attention contributes additionally at IFG (interpreted as combinatorial hub involved in language and non-language processing). The findings are stable when power differences are controlled. The connectivity analysis showed increased connectivity in the delta band (phrasal time scale) between IFG-STG in the phrasal-scale compared to the passive task (adding to the IFG MI findings). (Additionally, they separately analyze neural tracking at the syllabic and word time scale, which however is not in the main focus).
Major strength/weaknesses of the methods and results:
A major strength of the results is that part of them replicate the authors' earlier findings (i.e. higher tracking at the phrasal time scale for sentences compared to word-lists; Kaufeld et al., 2020), while they complement this earlier work by showing that the effects are due to linguistic processing and not to an attentional focus on the phrasal time scale due to the task (at least in STG and MTG; while the task plays a role for the IFG tracking). Another strength is that a power control analysis is applied, which allows excluding spurious results due to condition differences in power. A weakness of the method is that analyses were applied separately per ROI, and combined across correct/incorrect trials (if I understood correctly), no trial-based analysis was conducted (which is related to how MI is computed). Furthermore, several methodological details could be clarified in the manuscript.The authors achieved their aims by providing evidence that neuronal tracking at the phrasal time scale in STG and MTG depends on the presence of linguistic information at this scale rather than indicating an attentional focus on this time scale due to a specific task. Their results support the conclusion. Results would be strengthened by showing that these effects are not impacted by different amounts of correct/incorrect trials across conditions (if I understood that correctly).
The findings are an important contribution to the ongoing debate in the field whether neuronal tracking at the phrasal time scale indicates linguistic structure processing or more general processes (e.g. chunking).
-
Reviewer #3 (Public Review):
This manuscript presents a MEG study aiming to investigate whether neural tracking of phrasal timescales depends on automatic language processing or specific tasks related to temporal attention. The authors collected MEG data of 20 participants as they listened to naturally spoken sentences or word lists during four different tasks (passive listening vs. syllable task vs. word tasks vs. phrase task). Based on mutual information and Connectivity analysis, the authors found that (1) neural tracking at the phrasal band (0.8-1.1 Hz) was significantly stronger for the sentence condition compared to the word list condition across the classical language network, i.e., STG, MTG, and IFG; (2) neural tracking at the phrasal band was (at least tend significantly) stronger for phrase task than other tasks in the IFG; …
Reviewer #3 (Public Review):
This manuscript presents a MEG study aiming to investigate whether neural tracking of phrasal timescales depends on automatic language processing or specific tasks related to temporal attention. The authors collected MEG data of 20 participants as they listened to naturally spoken sentences or word lists during four different tasks (passive listening vs. syllable task vs. word tasks vs. phrase task). Based on mutual information and Connectivity analysis, the authors found that (1) neural tracking at the phrasal band (0.8-1.1 Hz) was significantly stronger for the sentence condition compared to the word list condition across the classical language network, i.e., STG, MTG, and IFG; (2) neural tracking at the phrasal band was (at least tend significantly) stronger for phrase task than other tasks in the IFG; (3) the IFG-STG connectivity was increased in the delta-band for the phrase task. Ultimately, the authors concluded that neural tracking of phrasal timescales relied on both automatic language processing and specific tasks.
Overall, this study is trying to tackle an interesting question related to the contributing factors for neural tracking of linguistic structures. The study procedure and analyses are well executed, and the conclusions of this paper are mostly well supported by data. However, I do have several major concerns.
1. The title of the manuscript uses the description "tracking of hierarchical linguistic structure". In general, hierarchical linguistic structures involve multiple linguistic units with different timescales, such as syllables, words, phrases, and sentences. In this study, however, the main analysis only focused on the phrasal band (0.8-1.1 Hz). It seemed that there was no significant stimulus- or task-effect on the word band or syllabic band (supplementary figures). Therefore, it is highly recommended that the authors modify the related descriptions, or explain why neural tracking of phrases can represent neural tracking of hierarchical linguistic structures in the current study.
2. In Methods, the authors employed MI analyses on three frequency bands: 0.8-1.1 Hz for the phrasal band, 1.9-2.8 Hz for the word band, and 3.5-5.0 Hz for the syllabic band (line 191-192). As the timescales of linguistic units are various and overlapped in natural speech, I wonder how the authors define the boundaries of these frequency bands, and whether these bands are proper for the naturally spoken stimuli in the current study. These important details should be clarified.
3. What is missing in the manuscript are the explanations of the correlation between behavioral performance and neural tracking. In Results, the behavioral performance shows significant differences across the active tasks (Figure 1), but the MI differences across the tasks are relatively weak in IFG (Figure 3). In addition, the behavioral performance only shows significant differences between the sentence and word list conditions during the phrasal task, but the MI differences between the conditions are significant in MTG during the syllabic, word, and phrasal tasks. Explanations for these inconsistent results are expected.
4. Since the behavioral performance of these active tasks is likely related to the temporal attention to relevant timescales of different linguistic units, I wonder whether there exist underlying neural correlates of behavioral performance (e.g., significant correlation between performance and mutual information). If so, it may be interesting and bring a new bright spot for the current study.
-