Auditory-motor surprisal reveals learning across multiple timescales during exploration and production
Curation statements for this article:-
Curated by eLife
eLife Assessment
This valuable study builds a novel auditory-motor paradigm to investigate how the brain learns associations between movements and their auditory consequences. Solid evidence is provided for early ERPs (50-100 ms latency) reflecting violations of established key-pitch mappings. The writing, however, could be streamlined to better emphasize the paper's key contribution, and some statistical analyses might be improved.
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (eLife)
Abstract
Auditory-motor learning is critical in mastering the production of complex sounds, such as speaking and playing music. It is anchored upon internal models of interactions between actions and their sensory consequences, which are fine-tuned by minimizing the errors between the predicted and received sound. Here, we applied the concept of surprisal to a piano-playing task to probe the neural dynamics of sensorimotor learning. Specifically, during play, the key-pitch map was changed unpredictably among three map configurations: normal, inverted, and shifted-inverted. At the change boundaries, a signature of violated motor-to-auditory predictions was found in the auditory evoked responses at N100 which could not be attributed to either purely auditory surprisals or motor execution errors. This surprisal is modulated by short-term context, with greater surprise following longer periods of no map change, indicating that the brain continuously tracks short-term map contexts and rapidly adapts to them. In contrast, 30 minutes of extended goal-directed training on a single map modulated P50 amplitude only for that map, which can be explained by a slow, persistent modulation of motor predictions from the auditory signals. Hence, while auditory predictions from motor actions are rapidly and implicitly learned within short-term contexts, the complementary process of adjusting motor inferences from auditory inputs requires targeted training sustained over time. Our approach of studying auditory-motor surprisal in time-varying sequences reveals that auditory-motor learning is fast, context-sensitive, and shaped by both short- and long-term experience.
Significance statement
Understanding how the brain links motor actions with their sensory consequences is key to explaining how complex skills are acquired and how they adapt to changing environments. Prior work has shown that short-term sensory feedback supports rapid adaptation. Yet, the neural mechanisms underpinning the evolution of internal sensorimotor associations across different stages of learning remain to be elucidated. We address this challenge by extending the concept of surprisal , traditionally used in studies of perception, to the sensorimotor domain. Results show that surprisal responses are modulated by both short-term sensory feedback and longer-term training, suggestive of two distinct neural mechanisms underlying sensorimotor learning. These findings advance our understanding of the neural dynamics of sensorimotor learning and inform development of technologies that interface with sensorimotor systems, such as virtual reality and brain–machine interfaces.
Article activity feed
-
eLife Assessment
This valuable study builds a novel auditory-motor paradigm to investigate how the brain learns associations between movements and their auditory consequences. Solid evidence is provided for early ERPs (50-100 ms latency) reflecting violations of established key-pitch mappings. The writing, however, could be streamlined to better emphasize the paper's key contribution, and some statistical analyses might be improved.
-
Reviewer #1 (Public review):
Summary:
Zhang et al. report on an ambitious study that investigates multiple aspects of the neural and behavioral underpinnings of auditory-motor surprisal in the context of an auditory-motor learning paradigm (piano keyboard). Using an intricate design comprising several sub-parts and control procedures, they report that early ERPs (50-100 ms latency) reflect violations of established key-pitch mappings.
Strengths:
This is a carefully devised and executed study. The paradigm is quite intricate and, at the same time, addresses multiple aspects of auditory-motor learning, and does so in a rigorous way.
Weaknesses:
Perhaps because of the exhaustive approach, it is sometimes difficult to follow which parts of the experimental design the results come from; there are some questions regarding appropriate …
Reviewer #1 (Public review):
Summary:
Zhang et al. report on an ambitious study that investigates multiple aspects of the neural and behavioral underpinnings of auditory-motor surprisal in the context of an auditory-motor learning paradigm (piano keyboard). Using an intricate design comprising several sub-parts and control procedures, they report that early ERPs (50-100 ms latency) reflect violations of established key-pitch mappings.
Strengths:
This is a carefully devised and executed study. The paradigm is quite intricate and, at the same time, addresses multiple aspects of auditory-motor learning, and does so in a rigorous way.
Weaknesses:
Perhaps because of the exhaustive approach, it is sometimes difficult to follow which parts of the experimental design the results come from; there are some questions regarding appropriate statistical methods, the inclusion/treatment of musical background in participants, and the nature (latency & extent) of the identified neural components that detect auditory-motor violations.
-
Reviewer #2 (Public review):
Summary:
Zhang et al. report an EEG study (n=18) of participants playing a keyboard where the correspondence between keys and pitches is varied to introduce sensory-motor mismatches (discrepancies between sensory inputs and expected sensory consequences of motor commands). They find that the auditory N100 amplitude is enhanced for the initial keystroke following a mapping switch but rapidly attenuates for subsequent keystrokes (showing rapid updating of the forward model), whereas the motor-related P50 amplitude only differentiates trained versus untrained mappings after 30 minutes of goal-directed practice (potentially showing timescales of inverse model updating). Using parallel univariate and mTRF decoding analyses, they conclude that forward models (mapping action to predicted sound) update almost …
Reviewer #2 (Public review):
Summary:
Zhang et al. report an EEG study (n=18) of participants playing a keyboard where the correspondence between keys and pitches is varied to introduce sensory-motor mismatches (discrepancies between sensory inputs and expected sensory consequences of motor commands). They find that the auditory N100 amplitude is enhanced for the initial keystroke following a mapping switch but rapidly attenuates for subsequent keystrokes (showing rapid updating of the forward model), whereas the motor-related P50 amplitude only differentiates trained versus untrained mappings after 30 minutes of goal-directed practice (potentially showing timescales of inverse model updating). Using parallel univariate and mTRF decoding analyses, they conclude that forward models (mapping action to predicted sound) update almost instantly to track short-term context, while inverse models (mapping sound to motor commands) update slowly and require extended, targeted practice.
Strengths
(1) Methodological innovation:
The study utilizes an interesting, continuous auditory-motor paradigm that moves beyond standard trial-by-trial oddball designs, offering a more ecologically valid measure of trial-to-trial adaptation.(2) Analytical elegance and rigor:
The combination of traditional univariate ERP analyses with multivariate temporal response function (mTRF) decoding is elegant, allowing the authors to successfully dissociate overlapping auditory and motor variance streams.(3) The dissociation between the rapid adaptation of the N100 forward model and the slower adaptation of the P50 inverse model is interesting.
Weaknesses
(1) Confounded passive listening baseline:
The passive listening control condition lacks an orthogonal behavioural task (e.g., an occasional oddball detection task). Active playing inherently necessitates focused attention on auditory feedback to monitor performance, whereas passive playback does not. The globally weaker stimulus-evoked pattern at electrode Fz during passive listening strongly suggests that the absence of an N100 effect in this condition may simply reflect a lower state of attention, rather than isolating the absence of a motor-driven forward prediction, in particular because the pure sensory suprisal was also enhanced for "firsts" notes, so this could also lead to stronger N1, but this effect may be masked.(2) Overclaimed theoretical novelty:
The conceptual framing leans excessively on the authors' specific "MirrorNet" framework, presenting foundational, decades-old tenets of the motor control literature (i.e., unsupervised exploration for forward models vs. supervised skill acquisition for inverse models; Wolpert, Jordan, both in the nineties) as their own novel "conjectures." This theory-heavy introduction obscures the paper's actual empirical contribution to the design and the interesting question regarding the distinct temporal adaptation scales of forward versus inverse models. I think some rewriting can improve the paper.(3) Misplaced surprisal terminology:
In a similar vein, I find the use of the term "auditory-motor surprisal" more theoretical grandstanding than actually useful. The significance statement claims to "extend this principle from sensory processing" but in fact, the concept of sensory motor unexpectedness is again a staple of the forward motor literature. Moreover, nowhere in the paper do they actually estimate sensorimotor surprisal. While the authors compute surprisal for their auditory baseline using IDyOM, their central sensorimotor analysis relies entirely on a simple categorical mismatch (first vs. subsequent keystrokes). The phenomenon can equally be referred to by its established nomenclature-"sensorimotor mismatch" or "sensory motor unexpectedness".(4) Incremental conceptual advance regarding the N100:
The paper frames the N100 finding as a major discovery, but as far as I know, the attenuation of the auditory N1 to self-generated sounds via accurate motor prediction-and its enhancement during sensorimotor mismatch - is one of the most heavily documented phenomena in the auditory-motor literature (e.g. Timm et al., 2013; Bendixen et al, 2012; 2013). As far as I'm concerned, the authors should clarify that the novelty lies in the novel, elegant design that provides a new way to correct for non-sensory-specific motor-induced attenuation, and characterizing the distinct adaptation timescales of forward versus inverse models -- not in demonstrating N100 modulation by sensorimotor mismatch, which is well-documented, AFAIC. -