Musical expertise is associated with improved neural statistical learning
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (eLife)
Abstract
It is poorly known whether musical training leads to improvements in general cognitive abilities, such as statistical learning (SL). In standard SL paradigms, musicians have better performances than non-musicians. However, these better performances could be due to an improved ability to process sensory information, as opposed to an improved ability to learn sequence statistics. Unfortunately, these very different explanations make similar predictions on the performances averaged over multiple trials. To solve this controversy, we developed a Bayesian model and recorded electroencephalography (EEG) to study trial-by-trial responses. Our results confirm that musicians perform ~15% better than non-musicians at predicting items in auditory sequences that embed either simple or complex statistics. This higher performance is explained in the Bayesian model by parameters governing SL, as opposed to parameters governing sensory information processing. EEG recordings reveal a neural underpinning of the musician’s advantage: the P300 amplitude correlates with the Bayesian model surprise elicited by each item, and so, more strongly for musicians than non-musicians. Finally, early EEG components correlate with the Bayesian model surprise elicited by simple statistics, as opposed to late EEG components that correlate with Bayesian model surprise elicited by complex statistics surprise, and so more strongly for musicians than non-musicians. Overall, our results prove that musical expertise is associated with improved neural SL, and support music-based intervention to fine tune general cognitive abilities.
Article activity feed
-
###Reviewer #3:
Summary
The manuscript presents an experiment in which participants listened to ten auditory sequences, generated with either first- or second-order statistical structure ("simple" vs "complex" SL respectively) and predicted 20 elements in each sequence during simultaneous EEG recording. Behavioural results showed that all participants performed better for simple than complex sequences and musicians performed better than non-musicians for both sequence types. A Bayesian model was developed with parameters controlling memory decay, sensory noise, model order (hierarchy) and selection noise, which were fitted to the responses of each participant. The results showed differences between musicians and non-musicians for parameters related to SL (model order, selection noise) but not parameters related to stimulus processing …
###Reviewer #3:
Summary
The manuscript presents an experiment in which participants listened to ten auditory sequences, generated with either first- or second-order statistical structure ("simple" vs "complex" SL respectively) and predicted 20 elements in each sequence during simultaneous EEG recording. Behavioural results showed that all participants performed better for simple than complex sequences and musicians performed better than non-musicians for both sequence types. A Bayesian model was developed with parameters controlling memory decay, sensory noise, model order (hierarchy) and selection noise, which were fitted to the responses of each participant. The results showed differences between musicians and non-musicians for parameters related to SL (model order, selection noise) but not parameters related to stimulus processing (sensory noise and memory decay). Specifically musicians showed evidence of higher-order prediction and lower selection noise. The EEG results linked increased amplitude at fronto-central electrodes at around 300 ms to modelled surprise for each participant, which was stronger for musicians than non-musicians. Separate analyses for models of different order produced evidence for an early modulation around 200ms for zeroth-order predictions which did not differ between musicians and non-musicians and a later modulation around 300ms for first- and second-order predictions which did differ between the two groups. These modulations were linked to the MMN and P300 respectively. The results are taken as evidence for better SL in musicians and discussed in terms of the Bayesian brain hypothesis.
Substantive Concerns
-- p. 4, para. 2: I believe that the evidence for musicians showing better SL is less strong than presented in the manuscript. In particular, using different stimuli and methods, both Loui et al., (2010) and Rohrmeier et al. (2011) found no difference between musicians and non-musicians in statistical learning of auditory sequences. Furthermore, with regard to reference 7 in the manuscript, although some studies have found larger ERAN amplitudes in musicians than non-musicians (Jentschke & Koelsch, 2009; Kim et al., 2011; Koelsch et al., 2007, 2002; Regnault et al., 2001) the differences are usually small and have not been replicated in all studies (e.g., Koelsch & Jentschke, 2008; Koelsch & Sammler, 2008; Miranda & Ullman, 2007; Steinbeis et al., 2006). The introduction and motivation for the experiment should be adapted to give a more detailed and balanced view of the literature and the divergence between the present results and those of Loui et al., (2010) and Rohrmeier et al., (2011) should be discussed and accounted for.
-- I'm not sure complexity is the most appropriate term to use in distinguishing statistical regularities of different order, since different transition tables at a single given order could be described as varying in statistical complexity. Having introduced the term, why not stick to "higher-order" and "lower-order"?
-- p 7: "Control analysis revealed that musicians and non-musicians do not benefit from an overall increase in performances during the course of the experiment." But there should be an improvement during each individual sequence, right? Is it possible to demonstrate this?
-- I think the authors should analyse the interaction in Fig. 1B and report whether or not it is significant.
-- I noted that while the authors report the consistency between the model and participants, they do not report the average accuracy of the model, which should be included for completeness. It would be good to report both of these analyses separately for complex and simple sequences, given the significant difference in performance between them.
-- p. 15: clarify that the same transition matrix was used for all five sequences of a given order
-- p. 15: what were the inclusion/exclusion criteria for the groups of musicians and non-musicians? How were participants recruited? This is important, especially given the divergence between the present findings and previous results (as noted above).
-- p. 16: are there any consequences of the fact that participants were aware of the probabilistic nature of the sequences and the differences between the two sequence types? Again, this seems to me to be an important divergence from other SL studies which could impact on the behavioural and neural effects observed and should, therefore, be discussed.
-- p. 16: "one participant was removed" - musician or non-musician?
-- p. 18 why was FCz used as the reference?
-- there are some inconsistencies in the way the model parameters are named - e.g., "late noise" in Supp. Figure 5. Please check through and use consistent terms throughout.
-- To facilitate replication and follow-up research, I would encourage the authors to make their data and model openly available.
-
###Reviewer #2:
The paper compares musicians' behavior and ERP responses to those of non-musicians with the following statement in the abstract:
"these better performances could be due to an improved ability to process sensory information, as opposed to an improved ability to learn sequence statistics. Unfortunately, these very different explanations make similar predictions on the performances averaged over multiple trials. To solve this controversy, we developed a Bayesian model and recorded electroencephalography (EEG) to study trial-by-trial responses."
The authors claim:
"This higher performance is explained in the Bayesian model by parameters governing SL, as opposed to parameters governing sensory information processing. " This is correct - but meaningless - the experiment does not challenge sensory noise since the 3 sounds used …
###Reviewer #2:
The paper compares musicians' behavior and ERP responses to those of non-musicians with the following statement in the abstract:
"these better performances could be due to an improved ability to process sensory information, as opposed to an improved ability to learn sequence statistics. Unfortunately, these very different explanations make similar predictions on the performances averaged over multiple trials. To solve this controversy, we developed a Bayesian model and recorded electroencephalography (EEG) to study trial-by-trial responses."
The authors claim:
"This higher performance is explained in the Bayesian model by parameters governing SL, as opposed to parameters governing sensory information processing. " This is correct - but meaningless - the experiment does not challenge sensory noise since the 3 sounds used are so distinct that sensory noise is zero in the two groups. Given that basic design - this phrasing is not only too strong, it is in proper.
My understanding is that are two actual observations in the paper:
Musicians' learning of second order markov statistics is better than that of non-musicians based on parameter fitting of a Bayesian model of their behavior in answering explicit questions regarding which sound (of 3 very distinct options) should come next.
ERP measures - specifically P300 of musicians, is more sensitive to this statistics as evident by its magnitude with respect to predictability/surprise of the sound based on serial statistics. These claims are interesting BUT - I am not convinced by the claim of specificity. I think the data (and previous studies) suggest that musicians do better with sound related judgments - with all respects.
I am not convinced that the model adds information since it explains the data as a good as single accuracy numbers (or did I miss something?). So I am not convinced that this trial by trial analysis adds information.
With respect to the specific model parameters:
Sensory noise is zero - the sounds are quite distinct. This is not an observation - this is how the experiment was designed. The authors admit that (indeed - any study that focused on sensory discrimination found an advantage in musicians) - but then state specificity, particularly in the abstract.
Regarding rate of decay - I wonder if this is relevant to overall performance when asked only up to 2nd order serial statistics. It may be sufficient for the task. The relevance of this parameter should be clarified.
Thus the lack of group difference in these parameters probably tells about the experiment rather than the groups.
Similarly, musicians' ERP responses are larger. But the early difference is not addressed at all. Is the earlier response sensitive to simpler stat - but in a similar way in both populations? Can't be - since they have a different magnitude. The authors base their analysis on (MEG analysis) in their 2019 paper. I tried to do the exact comparison, and wasn't sure about the mapping to components - please clarify the exact similarity.
Thus - overall - I am not sure that the model analysis provides new conceptual insights.
-
###Reviewer #1:
In this work, the authors used a combination of modelling, behavioral methods and EEG to understand whether sensitivity to the statistical structure of unfolding sound sequences differs between musician and non-musicians. Overall they demonstrate that musicians are better than non musicians at predicting forthcoming items. Modelling suggests that this advantage arises because they estimate higher order transition probabilities than non-musicians. The analysis of EEG data recorded during task performance showed that the amplitude of the P3 correlated with item predictability. Further analyses suggested that musicians and non-musicians have similar responses to surprise in simple sequences, with divergence between the groups occurring for higher order transition probabilities.
I have several concerns about task design, …
###Reviewer #1:
In this work, the authors used a combination of modelling, behavioral methods and EEG to understand whether sensitivity to the statistical structure of unfolding sound sequences differs between musician and non-musicians. Overall they demonstrate that musicians are better than non musicians at predicting forthcoming items. Modelling suggests that this advantage arises because they estimate higher order transition probabilities than non-musicians. The analysis of EEG data recorded during task performance showed that the amplitude of the P3 correlated with item predictability. Further analyses suggested that musicians and non-musicians have similar responses to surprise in simple sequences, with divergence between the groups occurring for higher order transition probabilities.
I have several concerns about task design, analysis and interpretation of the data which are detailed below:
The EEG data are recorded whilst participants are performing the behavioral prediction task. Though probe trials occurred rarely, it is conceivable that participants were making an active judgement for each sequence item. There is therefore a concern that the measured EEG data would reflect this aspect (active task performance) rather than automatic SL. This makes conclusions about "neural statistical learning" (e.g. as in the title) difficult to make.
In the results section the authors consider various differences between the musician and non-musician groups that could lead to differences in performance. One aspect that does not seem to be considered is that of attention, or task engagement. Is it possible that the musician participants were simply more engaged/less bored by the task? The EEG data (figure 3) are consistent with this interpretation showing overall substantially larger responses in the musicians relative to the non musicians.
Relatedly, is it possible that the results in Figure 3C are at least partly related to the overall amplitude differences between groups? Higher SNR in the musician group may lead to higher beta values. One way around this is to normalize the data (e.g. based on the P1 response) before computing the correlations.
Figure 4: can you show the ERP data on which the beta values are based?
Figure 4: the authors seek to conclude that the two groups have similar responses to surprise in simple statistical contexts (K=0) with divergence occurring for more complex statistical structure. However, they do not provide statistics to support this claim. It is not enough to show no significant difference between groups for K=0, but significant differences for K=1, 2 : you need to demonstrate an interaction.
More broadly, though, I do not understand the theoretical implications for this finding: why would brain response to K=0 occur earlier than k=2? Shouldn't the prediction be formed already before sound onset (especially given the relatively slow sequence rate).
Discussion: "Our results shed light on the musical training induced plasticity". This statement confuses correlation with causation. The authors discuss the reservation later in the discussion but it should be removed altogether.
-
##Preprint Review
This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.
###Summary:
This work constitutes an innovative and timely combination of modelling, behaviour and EEG to understand potential differences in SL abilities between musicians and non-musicians. However, as detailed below, we have many concerns regarding the modelling, experimental design and interpretation of the results.
Our major concerns are summarized here (and further elaborated in the individual reviews below):
Modelling: please report the accuracy of the model and whether this differs between groups.
You should …
##Preprint Review
This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.
###Summary:
This work constitutes an innovative and timely combination of modelling, behaviour and EEG to understand potential differences in SL abilities between musicians and non-musicians. However, as detailed below, we have many concerns regarding the modelling, experimental design and interpretation of the results.
Our major concerns are summarized here (and further elaborated in the individual reviews below):
Modelling: please report the accuracy of the model and whether this differs between groups.
You should analyse the interaction in Fig. 1B and report whether or not it is significant.
Relatedly, there appears to be an inconsistency between the behavioural results and the modelling. In the behavioural data you report a main effects of musicianship and of sequence complexity. Modelling of this data suggests that whilst the K for musicians is higher than non musicians it is substantially above 1 for both. If anything this should predict larger differences between groups in larger K than smaller K which is different from what is seen behaviourally. A similar inconsistency is present between the behavioural results and the results in figure 4 (see below). This requires careful consideration.
Can you do more to convince the reader that the model is performing well? Is the fit good, how does it vary across participants? Does rate of memory decay affect performance at all? Can you show good versus poor performers within the same group - do parameters also vary there?
It is important that you address the issues related to participants being aware of the stimulus construction. Are there any consequences of the fact that participants were aware of the probabilistic nature of the sequences and the differences between the two sequence types? This seems to be an important divergence from other SL studies which could impact on the behavioural and neural effects observed and should, therefore, be discussed.
The EEG data are recorded whilst participants are performing the behavioural prediction task. Though probe trials occurred rarely, it is conceivable that participants were making an active judgement for each sequence item. There is therefore a concern that the measured EEG data would reflect this aspect (active task performance) rather than automatic SL. This makes conclusions about "neural statistical learning" (e.g. as in the title) difficult to make.
In the results section the authors consider various differences between the musician and non-musician groups that could lead to differences in performance. One aspect that does not seem to be considered is that of attention, or task engagement. Is it possible that the musician participants were simply more engaged/less bored by the task? The EEG data (figure 3) are consistent with this interpretation showing overall substantially larger responses in the musicians relative to the non musicians.
In general, we think the model has been constructed with due care and attention and we like the separation of parameters related to statistical learning (model order and selection noise) and more general aspects of perception and cognition (sensory noise and memory decay). We think the difficulties arise in the relationship between the model and the experiment. Specifically, the sensory noise model parameter reveals very little in the analysis of this data because the sounds were so readily distinguishable, which appears to have been a deliberate choice in the experimental design, somewhat confusingly. The present stimulus set is therefore not suitable for distinguishing differences in sensory processing vs. SL between groups. We suggest that the authors could simply remove this parameter from the analysis and the paper would be clearer as a result. This would involve re-modelling and you will also have to reshape the way the experiment is motivated.
We have some questions about how the EEG data are analysed. In particular, the large amplitude difference between groups should be quantified, discussed and interpreted. We would also like to see stronger justification and discussion of why these differences are not affecting the main conclusions. We note that the authors provide R2 results in supp materials but we feel that a better approach may involve normalizing the responses before modelling. Higher SNR in the musician group may lead to stronger correlations. One way around this is to normalize the data (e.g. based on the P1 response) before computing the correlations.
You should perform the appropriate statistical analysis to support the claims associated with Figure 4. You seek to conclude that the two groups have similar responses to surprise in simple statistical contexts (K=0) with divergence occurring for more complex statistical structure. However, you do not provide statistics to support this claim. It is not enough to show no significant difference between groups for K=0, but significant differences for K=1, 2. You need to demonstrate an interaction between group and model order. Additionally, it was also not quite clear how modelling was performed here. We understand that you take surprise values from the model fitted to each participant but with the order fixed at 0, 1 or 2. This may mean that the other parameters might no longer be optimal in the context of the new fixed K values, depending on how different these were from the fitted values for each participant, which might plausibly differ for the musicians and non-musicians. To address this, Can you supplement the existing analysis with an analysis in which the K parameters are fixed at 0, 1 and 2, and the other parameters are re-optimised in the context of these fixed parameter values. Please also provide information about how well each individual data were fit, and whether there was a significant difference between musicians and non musicians. In general, we think the authors should present the result in figure 4 more cautiously and also flesh out the interpretation in more detail in relation to the literature along with a consideration of other potential interpretations. A small related point is that the term hierarchy is strongly related to this interpretation and we would prefer a more neutral term such as 'model order'.
The paper would benefit from a careful discussion of exactly what information, on top of that revealed with behaviour, is added by EEG and the significance of this in the context of the existing literature on expectation related ERP components.
-