Goal-directed vocal planning in a songbird

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This important work identifies a previously uncharacterized capacity for songbird to recover vocal targets even without sensory experience. The evidence supporting this claim is convincing, with technically difficult and innovative experiments exploring goal-directed vocal plasticity in deafened birds. This work has broad relevance to the fields of vocal and motor learning.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Songbirds’ vocal mastery is impressive, but to what extent is it a result of practice? Can they, based on experienced mismatch with a known target, plan the necessary changes to recover the target in a practice-free manner without intermittently singing? In adult zebra finches, we drive the pitch of a song syllable away from its stable (baseline) variant acquired from a tutor, then we withdraw reinforcement and subsequently deprive them of singing experience by muting or deafening. In this deprived state, birds do not recover their baseline song. However, they revert their songs towards the target by about one standard deviation of their recent practice, provided the sensory feedback during the latter signaled a pitch mismatch with the target. Thus, targeted vocal plasticity does not require immediate sensory experience, showing that zebra finches are capable of goal-directed vocal planning.

Article activity feed

  1. Author Response

    The following is the authors’ response to the original reviews.

    eLife assessment

    This important work identifies a previously uncharacterized capacity for songbirds to recover vocal targets even without sensory experience. While the evidence supporting this claim is solid, with innovative experiments exploring vocal plasticity in deafened birds, additional behavioral controls and analyses are necessary to shore up the main claims. If improved, this work has the potential for broad relevance to the fields of vocal and motor learning.

    We were able to address the requests for additional behavioral controls about the balancing of the groups (reviewer 1) and the few individual birds that showed a different behavior (reviewer 2) without collecting any further data. See our detailed replies below.

    Public Reviews:

    Reviewer #1 (Public Review):

    Summary:

    Zai et al test if songbirds can recover the capacity to sing auditory targets without singing experience or sensory feedback. Past work showed that after the pitch of targeted song syllables is driven outside of birds' preferred target range with external reinforcement, birds revert to baseline (i.e. restore their song to their target). Here the authors tested the extent to which this restoration occurs in muted or deafened birds. If these birds can restore, this would suggest an internal model that allows for sensory-to-motor mapping. If they cannot, this would suggest that learning relies entirely on feedback-dependent mechanisms, e.g. reinforcement learning (RL). The authors find that deafened birds exhibit moderate but significant restoration, consistent with the existence of a previously under-appreciated internal model in songbirds.

    Strengths:

    The experimental approach of studying vocal plasticity in deafened or muted birds is innovative, technically difficult, and perfectly suited for the question of feedback-independent learning. The finding in Figure 4 that deafened birds exhibit subtle but significant plasticity toward restoration of their pre-deafening target is surprising and important for the songbird and vocal learning fields, in general.

    Weaknesses:

    The evidence and analyses related to the directed plasticity in deafened birds are confusing, and the magnitude of the plasticity is far less than the plasticity observed in control birds with intact feedback. The authors acknowledge this difference in a two-system model of vocal plasticity, but one wonders why the feedback-independent model, which could powerfully enhance learning speed, is weak in this songbird system.

    We fully agree with the reviewer. This surprising weakness applies to birds’ inability rather than our approach for characterizing it.

    There remains some confusion about the precise pitch-change methods used to study the deafened birds, including the possibility that a critical cohort of birds was not suitably balanced in a way where deafened birds were tested on their ability to implement both pitch increases and decreases toward target restoration.

    Both deaf groups were balanced: (dLO and WNd) were balanced in that half of the birds (5/10 WNm and 4/8 dLO) shifted their pitch up (thus target restoration corresponded to decreasing pitch) and half of the birds (5/10 WNd and 4/8 dLO) shifted their pitch down (thus target restoration corresponded to increasing pitch), see Methods.

    To clarify the precise pitch-change method used, we added to the methods an explanation about why we used the sensitivity index 𝒅′ in Fig. 4:

    We used sensitivity 𝒅′ relative to the last 2 h of WN/LO instead of NRP because we wanted to detect a pitch change, which is the realm of detection theory, i.e. 𝒅′. Furthermore, by measuring local changes in pitch relative to the last 2 h of WN/LO reinforcement, our measurements are only minimally affected by the amount of reinforcement learning that might have occurred during this 2 h time window — choosing an earlier or longer window would have blended reinforced pitch changes into our estimates. Last but not least, changes in the way in which we normalized 𝒅’ values — dividing by 𝑺𝑩, — or using the NRP relative to the last 2 h of WN/LO did not qualitatively change the results shown in Fig. 4D.

    Reviewer #2 (Public Review):

    Summary:

    This paper investigates the role of motor practice and sensory feedback when a motor action returns to a learned or established baseline. Adult male zebra finches perform a stereotyped, learned vocalization (song). It is possible to shift the pitch of particular syllables away from the learned baseline pitch using contingent white noise reinforcement. When the reinforcement is stopped, birds will return to their baseline over time. During the return, they often sing hundreds of renditions of the song. However, whether motor action, sensory feedback, or both during singing is necessary to return to baseline is unknown.

    Previous work has shown that there is covert learning of the pitch shift. If the output of a song plasticity pathway is blocked during learning, there is no change in pitch during the training. However, as soon as the pathway is unblocked, the pitch immediately shifts to the target location, implying that there is learning of the shift even without performance. Here, they ask whether the return to baseline from such a pitch shift also involves covert or overt learning processes. They perform a series of studies to address these questions, using muting and deafening of birds at different time points. learning.

    Strengths:

    The overall premise is interesting and the use of muting and deafening to manipulate different aspects of motor practice vs. sensory feedback is a solid approach.

    Weaknesses:

    One of the main conclusions, which stems primarily from birds deafened after being pitch-shifted using white noise (WNd) birds in comparison to birds deafened before being pitchshifted with light as a reinforcer (LOd), is that recent auditory experience can drive motor plasticity even when an individual is deprived of such experience. While the lack of shift back to baseline pitch in the LOd birds is convincing, the main conclusion hinges on the responses of just a few WNd individuals who are closer to baseline in the early period. Moreover, only 2 WNd individuals reached baseline in the late period, though neither of these were individuals who were closer to baseline in the early phase. Most individuals remain or return toward the reinforced pitch. These data highlight that while it may be possible for previous auditory experience during reinforcement to drive motor plasticity, the effect is very limited. Importantly, it's not clear if there are other explanations for the changes in these birds, for example, whether there are differences in the number of renditions performed or changes to other aspects of syllable structure that could influence measurements of pitch.

    We thank the reviewer for these detailed observations. We looked into the reviewer’s claim that our main conclusion of revertive pitch changes in deaf birds with target mismatch experience hinges on only few WNd birds in the early period.

    When we remove the three birds that were close to baseline (NRP=0) in the early period, we still get the same trend that WNd birds show revertive changes towards baseline: Early 𝒅’ = −𝟎. 𝟏𝟑, 𝒑 = 𝟎. 𝟐𝟒, tstat = −𝟎.𝟕𝟒, 𝒅𝒇 = 𝟔, 𝑵 = 𝟕 birds, one-sided t-test of H0: 𝒅′ = 𝟎; Late 𝒅’ = −𝟏. 𝟐𝟔, 𝒑 = 𝟎. 𝟎𝟖, tstat = −𝟏.𝟔𝟑, 𝒅𝒇 = 𝟔, 𝑵 = 𝟕 birds, one-sided t-test of H0: 𝒅′ = 𝟎. Furthermore, even without these three birds, bootstrapping the difference between WNd and dC birds shows the same trend in the early period (p=0.22) and a significant reversion in the late period (p<0.001). Thus, the effect of reversion towards baseline in the late period is robustly observed on a population level, even when discounting for three individual birds that the reviewer suspected would be responsible for the effect.

    Moreover, note that there are not two but three WNd individuals that reached baseline in the late period (see Figure 2C, D). One of them was already close to baseline in the early period and another one was already relatively close, too.

    Also, the considerable variability among birds is not surprising, it is to be expected that the variability across deaf birds is large because of their ongoing song degradation that might lead to a drift of pitch over time since deafening.

    Last but not least, see also our multivariate model (below).

    With regards to the “differences in the number of renditions” that could explain pitch changes: Deaf birds sing less after deafening than hearing birds: they sing less during the first 2 hours (early): 87±59 renditions (WNd) and 410±330 renditions (dLO) compared to 616±272 renditions (control birds). Also, WN deaf birds sing only 4300±2300 motif renditions between the early and late period compared to the average of 11000±3400 renditions that hearing control birds produce in the same time period. However, despite these differences, when we provide WNd birds more time to recover, namely 9 days after the early period, they sung on average 12000±6000 renditions, yet their NRP was still significantly different from zero (NRP = 0.37, p=0.007, tstat=3.47, df=9). Thus, even after producing more practice songs, deaf birds do not recover baseline pitch and so the number of songs alone cannot explain why deaf birds do not fully recover pitch. We conclude that auditory experience seems to be necessary to recover song.

    We added this information to the Results.

    In this context, note that the interesting part of our work is not that deaf birds do not fully recover, but that they recover anything at all (“main conclusion”, Fig. 4). The number of songs does not explain why deaf birds with mismatch experience (WNd, singing the least and singing significantly less than control birds, p=2.3*10-6, two-tailed t-test) partially revert song towards baseline, unlike deaf birds without mismatch experience (dLO, singing significantly more than WNd birds, p=0.008, and indistinguishable from control birds, p=0.1). We added this information to the Results section.

    With regards to ‘other aspects of syllable structure’: We did not look into this. Regardless of the outcome of such a hypothetical analysis, whether other syllable features change is irrelevant for our finding that deaf birds do not recover their target song. Nevertheless, note that in Zai et al. 2020 (supplementary Figure 1), we analyzed features other than pitch change in deaf birds. Absolute change in entropy variance was larger in deaf birds than in hearing birds, consistent with the literature on song degradation after deafening (Lombardino and Nottebohm, 2000, Nordeen and Nordeen 2010 and many others). In that paper, we found that only pitch changes consistently along the LO direction. All other features that we looked at (duration, AM, FM and entropy) did not change consistently with the LO contingency. We expect that a similar result would apply for the changes across the recovery period in WNd and dLO birds, i.e., that song degradation can be seen in many features and that pitch is the sole feature that changes consistently with reinforcement (LO/WN) direction.

    While there are examples where the authors perform direct comparisons between particular manipulations and the controls, many of the statistical analyses test whether each group is above or below a threshold (e.g. baseline) separately and then make qualitative comparisons between those groups. Given the variation within the manipulated groups, it seems especially important to determine not just whether these are different from the threshold, but how they compare to the controls. In particular, a full model with time (early, late), treatment (deafened, muted, etc), and individual ID (random variable) would substantially strengthen the analysis.

    We performed a full model of the NRP as the reviewer suggests and it supports our conclusions: Neither muting, deafening nor time without practice between R and E windows have a significant effect on pitch in the E window, but the interaction between deafening and time (late, L) results in a significant pitch change (fixed effect 0.67, p=2*10-6), demonstrating that deaf birds are significantly further away from baseline (NRP=0) than hearing birds in late windows, thereby confirming that birds require auditory feedback to recover a distant pitch target. Importantly, we find a significant fixed effect on pitch in the direction of the target with mismatch experience (fixed effect -0.37, p=0.006), supporting our finding that limited vocal plasticity towards a target is possible even without auditory feedback.

    We included this model as additional analysis to our manuscript.

    The muted birds seem to take longer to return to baseline than controls even after they are unmuted. Presumably, there is some time required to recover from surgery, however, it's unclear whether muting has longer-term effects on syrinx function or the ability to pass air. In particular, it's possible that the birds still haven't recovered by 4 days after unmuting as a consequence of the muting and unmuting procedure or that the lack of recovery is indicative of an additional effect that muting has on pitch recovery. For example, the methods state that muted birds perform some quiet vocalizations. However, if birds also attempt to sing, but just do so silently, perhaps the aberrant somatosensory or other input from singing while muted has additional effects on the ability to regain pitch. It would also be useful to know if there is a relationship between how long they are muted and how quickly they return to baseline.

    We agree, it might be the case that muting has some longer-term effects that could explain why WNm birds did not recover pitch 4 days after unmuting. However, if such an effect exists, it is only weak. Arguing against the idea that a longer muting requires longer recovery, we did not find a correlation between the difference in NRP between early and late and 1. the duration the birds were muted (correlation coefficient = -0.50, p=0.20), and 2. the number of renditions the birds sung between early and late (correlation coefficient = 0.03, p=0.95), and 3. the time since they last sung the target song (last rendition of baseline, correlation coefficient = -0.43, p=0.29). Neither did we find a correlation between the early NRP and the time since the muting surgery (correlation coefficient = 0.26, p=0.53), suggesting that the lack of pitch recovery while muted was not due to a lingering burden of the muting surgery. We added these results to the results section.

    In summary, we used the WNm group to assess whether birds can recover their target pitch in the absence of practice, i.e. whether they recovered pitch in the early time period. Whether or not some long-term effect of the muting/unmuting procedure affects recovery does not impair the main finding we obtained from WNm birds in Figure 1 (that birds do not recover without practice).

    Reviewer #3 (Public Review):

    Summary:

    Zai et al. test whether birds can modify their vocal behavior in a manner consistent with planning. They point out that while some animals are known to be capable of volitional control of vocalizations, it has been unclear if animals are capable of planning vocalizations -that is, modifying vocalizations towards a desired target without the need to learn this modification by practicing and comparing sensory feedback of practiced behavior to the behavioral target. They study zebra finches that have been trained to shift the pitch of song syllables away from their baseline values. It is known that once this training ends, zebra finches have a drive to modify pitch so that it is restored back to its baseline value. They take advantage of this drive to ask whether birds can implement this targeted pitch modification in a manner that looks like planning, by comparing the time course and magnitude of pitch modification in separate groups of birds who have undergone different manipulations of sensory and motor capabilities. A key finding is that birds who are deafened immediately before the onset of this pitch restoration paradigm, but after they have been shifted away from baseline, are able to shift pitch partially back towards their baseline target. In other words, this targeted pitch shift occurs even when birds don't have access to auditory feedback, which argues that this shift is not due to reinforcement-learning-guided practice, but is instead planned based on the difference between an internal representation of the target (baseline pitch) and current behavior (pitch the bird was singing immediately before deafening).

    The authors present additional behavioral studies arguing that this pitch shift requires auditory experience of the song in its state after it has been shifted away from baseline (birds deafened early on, before the initial pitch shift away from baseline, do not exhibit any shift back towards baseline), and that a full shift back to baseline requires auditory feedback. The authors synthesize these results to argue that different mechanisms operate for small shifts (planning, does not need auditory feedback) and large shifts (reinforcement learning, requires auditory feedback).

    We thank the reviewer for this concise summary of our paper. To clarify, we want to point out that we do not make any statement about the learning mechanism birds use to make large shifts to recover their target pitch, i.e. we do not say that large shifts are learned by reinforcement learning requiring auditory feedback. We only show that large shifts require auditory feedback.

    The authors also make a distinction between two kinds of planning: covert-not requiring any motor practice and overt-requiring motor practice but without access to auditory experience from which target mismatch could be computed. They argue that birds plan overtly, based on these deafening experiments as well as an analogous experiment involving temporary muting, which suggests that indeed motor practice is required for pitch shifts.

    Strengths:

    The primary finding (that partially restorative pitch shift occurs even after deafening) rests on strong behavioral evidence. It is less clear to what extent this shift requires practice, since their analysis of pitch after deafening takes the average over within the first two hours of singing. If this shift is already evident in the first few renditions then this would be evidence for covert planning. This analysis might not be feasible without a larger dataset. Similarly, the authors could test whether the first few renditions after recovery from muting already exhibit a shift back toward baseline.

    This work will be a valuable addition to others studying birdsong learning and its neural mechanisms. It documents features of birdsong plasticity that are unexpected in standard models of birdsong learning based on reinforcement and are consistent with an additional, perhaps more cognitive, mechanism involving planning. As the authors point out, perhaps this framework offers a reinterpretation of the neural mechanisms underlying a prior finding of covert pitch learning in songbirds (Charlesworth et al., 2012).

    A strength of this work is the variety and detail in its behavioral studies, combined with sensory and motor manipulations, which on their own form a rich set of observations that are useful behavioral constraints on future studies.

    Weaknesses:

    The argument that pitch modification in deafened birds requires some experience hearing their song in its shifted state prior to deafening (Fig. 4) is solid but has an important caveat. Their argument rests on comparing two experimental conditions: one with and one without auditory experience of shifted pitch. However, these conditions also differ in the pitch training paradigm: the "with experience" condition was performed using white noise training, while the "without experience" condition used "lights off" training (Fig. 4A). It is possible that the differences in the ability for these two groups to restore pitch to baseline reflect the training paradigm, not whether subjects had auditory experience of the pitch shift. Ideally, a control study would use one of the training paradigms for both conditions, which would be "lights off" or electrical stimulation (McGregor et al. 2022), since WN training cannot be performed in deafened birds. This is difficult, in part because the authors previously showed that "lights off" training has different valences for deafened vs. hearing birds (Zai et al. 2020). Realistically, this would be a point to add to in discussion rather than a new experiment.

    We added the following statement to our manuscript:

    It is unlikely that dLO birds’ inability to recover baseline pitch is somehow due to our use of a reinforcer of a non-auditory (visual) modality, since somatosensory stimuli do not prevent reliable target pitch recovery in hearing birds (McGregor et al 2022).

    A minor caveat, perhaps worth noting in the discussion, is that this partial pitch shift after deafening could potentially be attributed to the birds "gaining access to some pitch information via somatosensory stretch and vibration receptors and/or air pressure sensing", as the authors acknowledge earlier in the paper. This does not strongly detract from their findings as it does not explain why they found a difference between the "mismatch experience" and "no mismatch experience groups" (Fig. 4).

    We added the following statement: Our insights were gained in deaf birds and we cannot rule out that deaf birds could gain access to pitch information via somatosensoryproprioceptive sensory modalities. However, such information, even if available, cannot explain the difference between the "mismatch experience” (WNd) and the "no mismatch experience" (dLO) groups, which strengthens our claim that the pitch reversion we observe is a planned change and not merely a rigid motor response (as in simple usedependent forgetting).

    More broadly, it is not clear to me what kind of planning these birds are doing, or even whether the "overt planning" here is consistent with "planning" as usually implied in the literature, which in many cases really means covert planning. The idea of using internal models to compute motor output indeed is planning, but why would this not occur immediately (or in a few renditions), instead of taking tens to hundreds of renditions?

    Indeed, what we call ‘covert planning’ refers to what usually is called ‘planning’ in the literature. Also, there seems to be currently no evidence for spontaneous overt planning in songbirds (which we elicited with deafening). Replay of song-like syringeal muscle activity can be induced by auditory stimuli during sleep (Bush, A., Doppler, J. F., Goller, F., and Mindlin, G. B. (2018), but to our knowledge there are no reports of similar replay in awake, non-singing birds, which would constitute evidence for overt planning.

    We cannot ascertain how fast birds can plan their song changes, but our findings are not in disagreement with fast planning. The smallest time window of analysis we chose is 2h, which sets a lower bound of the time frame within which we can measure pitch changes. Our approach is probably not ideally suited for determining the minimal planning time, because the deafening and muting procedures cause an increase in song variability, which calls for larger pitch sample sizes for statistical testing, and the surgeries themselves cause a prolonged period without singing during which we have no access to the birds’ planned motor output. Note that fast planning is demonstrated by the recent finding of instant imitation in nightingales (Costalunga, Giacomo, et al. 2023) and is evidenced by fast re-pitching upon context changes in Bengalese finches (Veit, L., Tian, L. Y., Monroy Hernandez, C. J., & Brainard, M. S., 2021).

    To resolve confusion, it would be useful to discuss and add references relating "overt" planning to the broader literature on planning, including in the introduction when the concept is introduced.

    Overt and covert planning are terms used in the literature on child development and on adult learning, see (Zajic, Matthew Carl, et al., Overt planning behaviors during writing in school-age children with autism spectrum disorder and attention-deficit/hyperactivity disorder, 2020) and (Abbas zare-ee, Researching Aptitude in a Process-Based Approach to Foreign Language Writing Instruction. Advances in Language and Literary Studies, 2014), and references therein.

    Indeed, muddying the interpretation of this behavior as planning is that there are other explanations for the findings, such as use-dependent forgetting, which the authors acknowledge in the introduction, but don't clearly revisit as a possible explanation of their results. Perhaps this is because the authors equate use-dependent forgetting and overt planning, in which case this could be stated more clearly in the introduction or discussion.

    We do not mean to strictly equate use-dependent forgetting and overt planning, although they can be related, namely when ‘use’ refers to ‘altered use’ as is the case when something about the behavior is missing (e.g. auditory feedback in our study), and the dependence is not just on ‘use’ but also on ‘experience’.

    We added the following sentence to the discussion: We cannot distinguish the overt planning we find from more complex use-and-experience dependent forgetting, since we only probed for recovery of pitch and did not attempt to push birds into planning pitch shifts further away from baseline.

    Recommendations for the authors:

    Reviewer #1 (Recommendations For The Authors):

    (1) The single main issue with this paper is in the section related to Figure 4, and the Figure itself - this is the most important part of the paper essential to buttress the claim of covert learning. However, there are several sources of confusion in the text, analyses, and figures. The key result is in Figure 4B, C - and, in the context of Figs 1-3, the data are significant but subtle. That is, as the authors state, the birds are mostly dependent on slow sensory feedback-dependent (possibly RL) mechanisms but there is a small component of target matching that evidences an internal model. One wonders why this capacity is so small - if they had a good internal model they'd be much faster and better at recovering target pitches after distortion-driven deviations even without sensory feedback.

    (1a) The analysis of the WNd and DLO reversions of pitch (related to Fig. 4) uses a d' analysis which is a pivot from the NRP analysis used in the rest of the paper. It is not clear why different analyses are being used here to compute essentially the same measure, i.e. how much did the pitch revert. It's also odd that different results are now obtained - Fig. 4 has a small but significant reversion of pitch in WNd birds but Fig. 2 shows no significant return to baseline.

    We did not test for reversion towards baseline in Fig. 2 and made no statement about whether there is a significant reversion or not. But when we do such a test, we find a significant reversion for WNd birds in the ‘late’ window (NRP=0.5, p=0.02, N=10, tstat=-1.77, two-tailed t-test), which agrees with Figure 4. In the ‘early’ window in Fig. 2, we find only a trend but no reversion (NRP = 0.76, p=0.11, n=10, tstat=-1.76), which contrasts with our findings in Figure 4. However, the discrepancy can be simply explained by the difference in time alignment that we detail in the Materials and Methods. Namely, in Figure 2, we measure pitch relative to the pitch in the morning on the day before, which is not a good measure of ‘reversion’ (since pitch had been reinforced further away during the day), which is why we do not present this analysis in the paper and dedicate a separate analysis in Figure 4 to reversion.

    (1b) Also in Fig. 4 is it the case that, as in the schematic of 4a, ALL birds in these experiments had their pitch pushed up - so that the return to baseline was all down? If this is the case the analysis may be contaminated by a pitch-down bias in deafened birds. This would ideally be tested with a balance of pitch-up and pitch-down birds in the pre-deafening period, and/or analysis of non-targeted harmonic stacks to examine their pitch changes. If non-targeted stacks exhibit pitch-down changes after deafening, then the reversion that forms the key discovery of this paper will be undermined. Please address.

    Both groups in Figure 4 were balanced (same number of birds were shifted their pitch up and down), see response to public review and Methods.

    (1c) After multiple re-reads and consultations with the Methods section I still do not understand the motivation or result for Figure 4E. Please provide clarification of the hypothesis/control being assessed and the outcome.

    Figure 4E does not add an additional result but strengthens our previous findings because we obtain the same result with a different method. The pitch of deaf birds tends to drift after deafening. To discount for this drift and the effect of time elapsed since deafening, we bootstrapped the magnitude of the pitch change in WNd and dLO birds by comparing them to dC birds in matched time windows. We modified the sentence in the results section to clarify this point:

    To discount for the effect of time elapsed since deafening and quantify the change in pitch specifically due to reinforcement, we bootstrapped the difference in 𝒅′ between dLO/WNd birds and a new group of dC birds that were deafened but experienced no prior reinforcement (see methods).

    (1d) Line 215. It's not clear in the text here how the WNd birds experience a pitch mismatch. Please clarify the text that this mismatch was experienced before deafening. This is a critical paragraph to set up the main claims of the paper. Also, it's not clear what is meant by 'fuel their plan'? I can imagine this would simply be a DA-dependent plasticity process in Area X that does not fuel a plan but rather re-wires and HVC timestep to medium spiny neurons whose outputs drive pitch changes - i.e. not a fueled plan but simply an RL-dependent re-mapping in the motor system. Alternatively, a change could result in plasticity in pallial circuits (e.g. auditory to HVC mappings) that are RL independent and invoke an inverse model along the lines of the author's past work (e.g. Ganguli and Hahnlsoer). This issue is taken up in the discussion but the setup here in the results is very confusing about the possible outcomes. This paragraph is vague with respect to the key hypotheses. It's possible that the WNd and DLO groups enable dissection of the two hypotheses above - because the DLO groups would presumably have RL signals but without recovery - but there remains a real lack of clarity over exactly how the authors are interpreting Fig 4 at the mechanistic level.

    WNd birds experience a pitch mismatch because while singing they hear that their pitch differs from baseline pitch, but the same is not true for dLO birds. We simply tested whether this experience makes a difference for reversion and it does. We added ‘before deafening’ to the paragraph and changed the wording of our hypothesis to make it clearer (we reworded ‘fuel their plan’). Mechanistic interpretations we left in the discussion. Without going to details, all we are saying is that birds can only plan to revert motor changes they are aware of in the first place.

    Minor issues

    The songs of deafened birds degrade, at a rate that depends on the bird's age. Younger crystalized birds degrade much faster, presumably because of lower testosterone levels that are associated with increased plasticity and LMAN function. Some background is needed on deafened birds to set up the WNd experiments.

    Despite deafening leading to the degradation of song (Lombardino and Nottebohm, 2000), syllable detection and pitch calculation were still possible in all deaf birds (up to 13-50 days after deafening surgery, age range 90-300 dph, n=44 birds).

    Since pitch shifting was balanced in both deaf bird groups (the same number of birds were up- and down-shifted), systematic changes in pitch post deafening (Lombardino and Nottebohm, 2000) will average out and so would not affect our findings.

    Lines 97-103. The paragraph is unclear and perhaps a call to a SupFig to show the lack of recovery would help. If I understand correctly, the first two birds did not exhibit the normal recovery to baseline if they did not have an opportunity to hear themselves sing without the WN. I am failing to understand this.

    In the early window (first 2 hours after unmuting) birds have not changed their pitch compared to their pitch in the corresponding window at the end of reinforcement (with matching time-of-day). We added ‘immediately after unmuting (early)’ to clarify this statement.

    Lines 68-69. What is the difference between (2) and (3)? Both require sensory representation/target to be mapped to vocal motor output. Please clarify or fuse these concepts.

    We fused the concept and changed the figure and explanation accordingly.

    Line 100. Please name the figure to support the claim.

    We marked the two birds in the Fig. 1H and added a reference in the text.

    Line 109. Is there a way to confirm / test if muted birds attempted to sing?

    Unfortunately, we do not have video recordings to check if there are any signs of singing attempts in muted birds.

    Line 296: Why 'hierarchically 'lower'?

    Lower because without it there is nothing to consolidate, i.e. the higher process can only be effective after the lower but not before. We clarified this point in the text.

    Past work on temporal - CAF (tcaf) by the Olveczky group showed that syllable durations and gaps could be reinforced in a way that does not depend on Area X and, therefore, related to the authors' discussion on the possible mechanisms of sensory-feedback independent recovery, may rely on the same neural substrates that Fig. 4 WNd group uses to recover. Yet the authors find in this paper that tCAF birds did not recover. There seems to be an oddity here - if covert recovery relies on circuits outside the basal ganglia and RL mechanisms, wouldn't t-CAF birds be more likely to recover? This is not a major issue but is a source of confusion related to the authors' interpretations that could be fleshed out.

    This is a good point, we reinvestigated the tCAF birds in the context of Fig 4 where we looked for pitch reversions towards baseline. tCAF birds do also revert towards baseline. We added this information to the supplement. We cannot say anything about the mechanistic reasons for lack of recovery, especially given that we did not look at brain-level mechanisms.

    Reviewer #2 (Recommendations For The Authors):

    The data presentation could be improved. It is difficult to distinguish between the early and late symbols and to distinguish between the colors for the individual lines on the plots or to match them with the points on the group data plots. In addition, because presumably, the points in plots like 2D are for the same individuals, lines connecting those points would be useful rather than trying to figure out which points are the same color.

    We added lines in Fig. 2D connecting the birds in early and late.

    The model illustrations (Fig 1A, Fig 5) are not intuitive and do not help to clarify the different hypotheses or ideas. I think these need to be reworked.

    We revised the model illustrations and hope they improved to clarify the different hypothesis.

    Some of the phrasing is confusing. Especially lines 157-158 and 256-257.

    Lines 157-158: we removed an instance of ‘WNd’, which was out of place.

    Lines 256-257: we rephrased to ‘showing that prior experience of a target mismatch is necessary for pitch reversion independently of auditory feedback’

    Reviewer #3 (Recommendations For The Authors):

    For Fig. 1, the conclusion in the text "Overall, these findings suggest that either motor practice, sensory feedback, or both, are necessary for the recovery of baseline song" is not aligned with the figure header "Recovery of pitch target requires practice".

    We rephrased the conclusion to: Overall, these findings rule out covert planning in muted birds and suggest that motor practice is necessary for recovery of baseline song.

    The use of the term "song experience" can be confusing as to whether it means motor or auditory experience. Perhaps replace it with "singing experience" or "auditory experience" where appropriate.

    We did the requested changes.

    Fig. 1A, and related text, reads as three hypotheses that the authors will test in the paper, but I don't think this turns out to the be the main goal (and if it is, it is not clear their results differentiate between hypotheses 1, 2, and 3). Perhaps reframe as discussion points and have this panel not be so prominent at the start, just to avoid this confusion.

    We modified the illustration in Fig 1A and simplified it. We now only show the 2 hypotheses that we test in the paper.

    Line 275-276, "preceding few hours necessitates auditory feedback, which sets a limit to zebra finches' covert planning ability". Did the authors mean "overt", not covert? Since their study focuses on overt planning.

    Our study focuses on covert planning in figure 1 and overt planning in subsequent figures.

    The purpose of the paragraph starting on line 278 could be more clear. Is the goal to say that overt planning and what has previously been described as use-dependent forgetting are actually the same thing? If not, what is the relationship between overt planning and forgetting? In other words, why should I care about prior work on use-dependent forgetting?

    We moved the paragraph further down where it does not interrupt the narrative. See also our reply to reviewer 3 on use-dependent forgetting.

    Line 294, "...a dependent process enabled by experience of the former...", was not clear what "former" is referring to. In general, this paragraph was difficult to understand. Line 296: Which is the "lower" process?

    We added explanatory parentheses in the text to clarify. We rephrased the sentence to ‘the hierarchically lower process of acquisition or planning as we find is independent of immediate sensory experience.’

    Line 295, the reference to "acquisition" vs. "retention". It is not clear how these two concepts relate to the behavior in this study, and/or the hierarchical processes referenced in the previous sentence. Overall, it is not clear how consolidation is related to the paper's findings.

    We added explanatory parentheses in the text and changed figure 5 to better explain the links.

    Line 305, add a reference to Warren et al. 2011, which I believe was the first study (or one of them) that showed that AFP bias is required for restoring pitch to baseline.

    We are citing Warren et al. 2011 in the sentence:

    Such separation also applies to songbirds. Both reinforcement learning of pitch and recovery of the original pitch baseline depend on the anterior forebrain pathway and its output, the lateral magnocellular nucleus of the anterior nidopallium (LMAN)(1).

    Line 310, "Because LMAN seems capable of executing a motor plan without sensory feedback", is this inferred from this paper (in which case this is an overreach) or is this referencing prior work (if so, which one, and please cite)?

    We changed the wording to ‘It remains to be seen whether LMAN is capable of executing a motor plans without sensory feedback’.

    Line 326, "which makes them well suited for planning song in a manner congruent with experience." I don't fully understand the logic. Can this sentence be clarified?

    We rephrased the sentence and added an explanation as follows: …which makes them well suited for executing song plans within the range of recent experience (i.e., if the song is outside recent experience, it elicits no LMAN response and so does not gain access to planning circuits).

  2. eLife assessment

    This important work identifies a previously uncharacterized capacity for songbird to recover vocal targets even without sensory experience. The evidence supporting this claim is convincing, with technically difficult and innovative experiments exploring goal-directed vocal plasticity in deafened birds. This work has broad relevance to the fields of vocal and motor learning.

  3. Reviewer #1 (Public Review):

    Summary: Zai et al test if songbirds can recover the capacity to sing auditory targets without singing experience or sensory feedback. Past work showed that after the pitch of targeted song syllables are driven outside of birds' preferred target range with external reinforcement, birds revert to baseline (i.e. restore their song to their target). Here the authors tested the extent to which this restoration occurs in muted or deafened birds. If these birds can restore, this would suggest an internal model that allows for sensory-to-motor mapping. If they cannot, this would suggest that learning relies entirely on feedback dependent mechanisms, e.g. reinforcement learning (RL). The authors find that deafened birds exhibit moderate but significant restoration, consistent with the existence of a previously under-appreciated internal model in songbirds.

    Strengths:

    The experimental approach of studying vocal plasticity in deafened or muted birds is innovative, technically difficult and perfectly suited for the question of feedback-independent learning. The finding in Figure 4 that deafened birds exhibit subtle but significant plasticity toward restoration of their pre-deafening target is surprising and important for the songbird and vocal learning fields, in general.

    In this revision, the authors suitably addressed the confusion about some statistical methods related to Fig. 4, where the main finding of vocal plasticity in deafened birds was presented.

    There remain minor issues in the presentation early in the results section and in Fig. 4 that should be straightforward to clarify in revision.

  4. Reviewer #3 (Public Review):

    Summary:

    Zai et al. test whether birds can modify their vocal behavior in a manner consistent with planning. They point out that while some animals are known to be capable of volitional control of vocalizations, it has been unclear if animals are capable of planning vocalizations-that is, modifying vocalizations towards a desired target without the need to learn this modification by practicing and comparing sensory feedback of practiced behavior to the behavioral target. They study zebra finches that have been trained to shift the pitch of song syllables away from their baseline values. It is known that once this training ends, zebra finches have a drive to modify pitch so that it is restored back to its baseline value. They take advantage of this drive to ask whether birds can implement this targeted pitch modification in a manner that looks like planning, by comparing the time course and magnitude of pitch modification in separate groups of birds who have undergone different manipulations of sensory and motor capabilities. A key finding is that birds who are deafened immediately before the onset of this pitch restoration paradigm, but after they have been shifted away from baseline, are able to shift pitch partially back towards their baseline target. In other words, this targeted pitch shift occurs even when birds don't have access to auditory feedback, which argues that this shift is not due to reinforcement-learning-guided practice, but is instead planned based on the difference between an internal representation of the target (baseline pitch) and current behavior (pitch the bird was singing immediately before deafening).

    The authors present additional behavioral studies arguing that this pitch shift requires auditory experience of song in its state after it has been shifted away from baseline (birds deafened early on, before the initial pitch shift away from baseline, do not exhibit any shift back towards baseline), and that a full shift back to baseline requires auditory feedback. The authors synthesize these results to argue that different mechanisms operate for small shifts (planning, which does not need auditory feedback) and large shifts (through a mechanism that requires auditory feedback).

    The authors also make a distinction between two kinds of planning: covert-not requiring any motor practice-and overt-requiring motor practice, but without access to auditory experience from which target mismatch could be computed. They argue that birds plan overtly, based on these deafening experiments as well as an analogous experiment involving temporary muting, which suggest that indeed motor practice is required for pitch shifts.

    Strengths:

    The primary finding (that partially restorative pitch shift occurs even after deafening) rests on strong behavioral evidence. It is less clear to what extent this shift requires practice, since their analysis of pitch after deafening takes the average over within the first two hours of singing. If this shift is already evident in the first few renditions then this would be evidence for covert planning. Technical hurdles, such as limited sample sizes and unstable song after surgical deafening, make this difficult to test. (Similarly, the authors could test whether the first few renditions after recovery from muting already exhibit a shift back towards baseline.)

    This work will be a valuable addition to others studying birdsong learning and its neural mechanisms. It documents features of birdsong plasticity that are unexpected in standard models of birdsong learning based on reinforcement and are consistent with an additional, perhaps more cognitive, mechanism involving planning. As the authors point out, perhaps this framework offers a reinterpretation of the neural mechanisms underlying a prior finding of covert pitch learning in songbirds (Charlesworth et al., 2012).

    A strength of this work is the variety and detail in its behavioral studies, combined with sensory and motor manipulations, which on their own form a rich set of observations that are useful behavioral constraints on future studies.

    Weaknesses:

    The argument that pitch modification in deafened birds requires some experience hearing their song in its shifted state prior to deafening (Fig. 4) is solid, but has an important caveat. Their argument rests on comparing two experimental conditions: one with and one without auditory experience of shifted pitch. However, these conditions also differ in the pitch training paradigm: the "with experience" condition was performed using white noise training, while the "without experience" condition used "lights off" training (Fig. 4A). It is possible that the differences in ability for these two groups to restore pitch to baseline reflects the training paradigm, not whether subjects had auditory experience of the pitch shift. Ideally, a control study would use one of the training paradigms for both conditions, which would be "lights off" or electrical stimulation (McGregor et al. 2022), since WN training cannot be performed in deafened birds. In the Discussion, in response to this point the authors point out that birds are known to recover their pitch shift if those shifts are driven using electrical stimulation as reinforcement (McGregor et al. 2022); however, it is arguably still relevant to know whether a similar recovery occurs for the "lights off" paradigm used here.

  5. eLife assessment

    This important work identifies a previously uncharacterized capacity for songbirds to recover vocal targets even without sensory experience. While the evidence supporting this claim is solid, with innovative experiments exploring vocal plasticity in deafened birds, additional behavioral controls and analyses are necessary to shore up the main claims. If improved, this work has the potential for broad relevance to the fields of vocal and motor learning.

  6. Reviewer #1 (Public Review):

    Summary: Zai et al test if songbirds can recover the capacity to sing auditory targets without singing experience or sensory feedback. Past work showed that after the pitch of targeted song syllables is driven outside of birds' preferred target range with external reinforcement, birds revert to baseline (i.e. restore their song to their target). Here the authors tested the extent to which this restoration occurs in muted or deafened birds. If these birds can restore, this would suggest an internal model that allows for sensory-to-motor mapping. If they cannot, this would suggest that learning relies entirely on feedback-dependent mechanisms, e.g. reinforcement learning (RL). The authors find that deafened birds exhibit moderate but significant restoration, consistent with the existence of a previously under-appreciated internal model in songbirds.

    Strengths:
    The experimental approach of studying vocal plasticity in deafened or muted birds is innovative, technically difficult, and perfectly suited for the question of feedback-independent learning. The finding in Figure 4 that deafened birds exhibit subtle but significant plasticity toward restoration of their pre-deafening target is surprising and important for the songbird and vocal learning fields, in general.

    Weaknesses:
    The evidence and analyses related to the directed plasticity in deafened birds are confusing, and the magnitude of the plasticity is far less than the plasticity observed in control birds with intact feedback. The authors acknowledge this difference in a two-system model of vocal plasticity, but one wonders why the feedback-independent model, which could powerfully enhance learning speed, is weak in this songbird system.

    There remains some confusion about the precise pitch-change methods used to study the deafened birds, including the possibility that a critical cohort of birds was not suitably balanced in a way where deafened birds were tested on their ability to implement both pitch increases and decreases toward target restoration.

  7. Reviewer #2 (Public Review):

    Summary: This paper investigates the role of motor practice and sensory feedback when a motor action returns to a learned or established baseline. Adult male zebra finches perform a stereotyped, learned vocalization (song). It is possible to shift the pitch of particular syllables away from the learned baseline pitch using contingent white noise reinforcement. When the reinforcement is stopped, birds will return to their baseline over time. During the return, they often sing hundreds of renditions of the song. However, whether motor action, sensory feedback, or both during singing is necessary to return to baseline is unknown.

    Previous work has shown that there is covert learning of the pitch shift. If the output of a song plasticity pathway is blocked during learning, there is no change in pitch during the training. However, as soon as the pathway is unblocked, the pitch immediately shifts to the target location, implying that there is learning of the shift even without performance. Here, they ask whether the return to baseline from such a pitch shift also involves covert or overt learning processes. They perform a series of studies to address these questions, using muting and deafening of birds at different time points. learning.

    Strengths: The overall premise is interesting and the use of muting and deafening to manipulate different aspects of motor practice vs. sensory feedback is a solid approach.

    Weaknesses: One of the main conclusions, which stems primarily from birds deafened after being pitch-shifted using white noise (WNd) birds in comparison to birds deafened before being pitch-shifted with light as a reinforcer (LOd), is that recent auditory experience can drive motor plasticity even when an individual is deprived of such experience. While the lack of shift back to baseline pitch in the LOd birds is convincing, the main conclusion hinges on the responses of just a few WNd individuals who are closer to baseline in the early period. Moreover, only 2 WNd individuals reached baseline in the late period, though neither of these were individuals who were closer to baseline in the early phase. Most individuals remain or return toward the reinforced pitch. These data highlight that while it may be possible for previous auditory experience during reinforcement to drive motor plasticity, the effect is very limited. Importantly, it's not clear if there are other explanations for the changes in these birds, for example, whether there are differences in the number of renditions performed or changes to other aspects of syllable structure that could influence measurements of pitch.

    While there are examples where the authors perform direct comparisons between particular manipulations and the controls, many of the statistical analyses test whether each group is above or below a threshold (e.g. baseline) separately and then make qualitative comparisons between those groups. Given the variation within the manipulated groups, it seems especially important to determine not just whether these are different from the threshold, but how they compare to the controls. In particular, a full model with time (early, late), treatment (deafened, muted, etc), and individual ID (random variable) would substantially strengthen the analysis.

    The muted birds seem to take longer to return to baseline than controls even after they are unmuted. Presumably, there is some time required to recover from surgery, however, it's unclear whether muting has longer-term effects on syrinx function or the ability to pass air. In particular, it's possible that the birds still haven't recovered by 4 days after unmuting as a consequence of the muting and unmuting procedure or that the lack of recovery is indicative of an additional effect that muting has on pitch recovery. For example, the methods state that muted birds perform some quiet vocalizations. However, if birds also attempt to sing, but just do so silently, perhaps the aberrant somatosensory or other input from singing while muted has additional effects on the ability to regain pitch. It would also be useful to know if there is a relationship between how long they are muted and how quickly they return to baseline.

  8. Reviewer #3 (Public Review):

    Summary:
    Zai et al. test whether birds can modify their vocal behavior in a manner consistent with planning. They point out that while some animals are known to be capable of volitional control of vocalizations, it has been unclear if animals are capable of planning vocalizations -that is, modifying vocalizations towards a desired target without the need to learn this modification by practicing and comparing sensory feedback of practiced behavior to the behavioral target. They study zebra finches that have been trained to shift the pitch of song syllables away from their baseline values. It is known that once this training ends, zebra finches have a drive to modify pitch so that it is restored back to its baseline value. They take advantage of this drive to ask whether birds can implement this targeted pitch modification in a manner that looks like planning, by comparing the time course and magnitude of pitch modification in separate groups of birds who have undergone different manipulations of sensory and motor capabilities. A key finding is that birds who are deafened immediately before the onset of this pitch restoration paradigm, but after they have been shifted away from baseline, are able to shift pitch partially back towards their baseline target. In other words, this targeted pitch shift occurs even when birds don't have access to auditory feedback, which argues that this shift is not due to reinforcement-learning-guided practice, but is instead planned based on the difference between an internal representation of the target (baseline pitch) and current behavior (pitch the bird was singing immediately before deafening).

    The authors present additional behavioral studies arguing that this pitch shift requires auditory experience of the song in its state after it has been shifted away from baseline (birds deafened early on, before the initial pitch shift away from baseline, do not exhibit any shift back towards baseline), and that a full shift back to baseline requires auditory feedback. The authors synthesize these results to argue that different mechanisms operate for small shifts (planning, does not need auditory feedback) and large shifts (reinforcement learning, requires auditory feedback).

    The authors also make a distinction between two kinds of planning: covert-not requiring any motor practice and overt-requiring motor practice but without access to auditory experience from which target mismatch could be computed. They argue that birds plan overtly, based on these deafening experiments as well as an analogous experiment involving temporary muting, which suggests that indeed motor practice is required for pitch shifts.

    Strengths:
    The primary finding (that partially restorative pitch shift occurs even after deafening) rests on strong behavioral evidence. It is less clear to what extent this shift requires practice, since their analysis of pitch after deafening takes the average over within the first two hours of singing. If this shift is already evident in the first few renditions then this would be evidence for covert planning. This analysis might not be feasible without a larger dataset. Similarly, the authors could test whether the first few renditions after recovery from muting already exhibit a shift back toward baseline.

    This work will be a valuable addition to others studying birdsong learning and its neural mechanisms. It documents features of birdsong plasticity that are unexpected in standard models of birdsong learning based on reinforcement and are consistent with an additional, perhaps more cognitive, mechanism involving planning. As the authors point out, perhaps this framework offers a reinterpretation of the neural mechanisms underlying a prior finding of covert pitch learning in songbirds (Charlesworth et al., 2012).

    A strength of this work is the variety and detail in its behavioral studies, combined with sensory and motor manipulations, which on their own form a rich set of observations that are useful behavioral constraints on future studies.

    Weaknesses:
    The argument that pitch modification in deafened birds requires some experience hearing their song in its shifted state prior to deafening (Fig. 4) is solid but has an important caveat. Their argument rests on comparing two experimental conditions: one with and one without auditory experience of shifted pitch. However, these conditions also differ in the pitch training paradigm: the "with experience" condition was performed using white noise training, while the "without experience" condition used "lights off" training (Fig. 4A). It is possible that the differences in the ability for these two groups to restore pitch to baseline reflect the training paradigm, not whether subjects had auditory experience of the pitch shift. Ideally, a control study would use one of the training paradigms for both conditions, which would be "lights off" or electrical stimulation (McGregor et al. 2022), since WN training cannot be performed in deafened birds. This is difficult, in part because the authors previously showed that "lights off" training has different valences for deafened vs. hearing birds (Zai et al. 2020). Realistically, this would be a point to add to in discussion rather than a new experiment.

    A minor caveat, perhaps worth noting in the discussion, is that this partial pitch shift after deafening could potentially be attributed to the birds "gaining access to some pitch information via somatosensory stretch and vibration receptors and/or air pressure sensing", as the authors acknowledge earlier in the paper. This does not strongly detract from their findings as it does not explain why they found a difference between the "mismatch experience" and "no mismatch experience groups" (Fig. 4).

    More broadly, it is not clear to me what kind of planning these birds are doing, or even whether the "overt planning" here is consistent with "planning" as usually implied in the literature, which in many cases really means covert planning. The idea of using internal models to compute motor output indeed is planning, but why would this not occur immediately (or in a few renditions), instead of taking tens to hundreds of renditions? To resolve confusion, it would be useful to discuss and add references relating "overt" planning to the broader literature on planning, including in the introduction when the concept is introduced. Indeed, muddying the interpretation of this behavior as planning is that there are other explanations for the findings, such as use-dependent forgetting, which the authors acknowledge in the introduction, but don't clearly revisit as a possible explanation of their results. Perhaps this is because the authors equate use-dependent forgetting and overt planning, in which case this could be stated more clearly in the introduction or discussion.

  9. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/7301755.

    Part 1. Summary 

    The goal of this paper is to observe whether targeted vocal plasticity in the songbird is a result of goal directed vocal planning independent of sensory feedback experience. The authors first look into whether or not a bird can recover song syllables without practice, in terms of syllable pitch. They carry out experiments where they shift the bird's pitch of song and then mute the birds for a duration of time, then unmute them to observe whether or not the pitch of their song changes from baseline. They find that after the muted state birds are not able to accurately recover their reinforced songs compared to birds undergoing normal song recovery, in unmanipulated control birds after removal of pitch altering reinforcement. They bring up the factor that invasive muting treatment might lead to decreased recovery potential. These findings dictate that either motor practice (i.e. physically singing) or sensory feedback (i.e. hearing a bird's own song), or a combination thereof, are required for song recovery. 

    They specifically delve into whether motor experience, but not sensory experience, is what facilitates recovery. They gave birds slightly more song tutoring, and deafened them through bilateral cochlear removal, which does not suppress the physical action of singing, just removes sensory feedback through audition. They have feedback from somatosensory integration, and vibrational sensing. They find that there is negligible song recovery after practice. They bring up another potential confounding variable which is that birds might be overwhelmed by sudden deafening, so they made a third experimental group which were deafened, monitored for return to stable baseline song, then had their pitch manipulated through a light based mechanism. They were then monitored for pitch recovery, and the researchers discovered that this group was unable to recover to baseline pitch, suggesting that song recovery requires no change in sensory experience, which includes auditory feedback. 

    They continue to conduct a set of experiments where they observe whether birds shift toward baseline when deprived of sensory experience. They hypothesize that hearing a mismatch in pitch fuels birds to plan to recover their pitch target, and this is carried out without feedback. They find that pitch reversion in deaf birds is initially faster than in hearing birds in the WNd (white noise deafened) group, while in the dLO (deafened then light out stimuli) pitch shifts further from baseline, but this change is no different from dC (deaf control) birds. They carry out control experiments to outrule circadian rhythm and time effects on recovery. They conclude by presenting the model that birds plan reduction of mismatch by targeting the mismatch, driven by sensory feedback. Without sensory feedback they are limited to the scope of previous song practice. 

    Overall the authors provide strong reasoning for why they used their specific model organism, valid controls for each experimental data set, and touched on how confounding variables might play into their experiments by ruling them out as contributing factors. They are very thoughtful in their approach and funnel their experiments nicely.

    Part 2. Detailed comments: 

    1. Significance 

    The authors specifically provide a significance statement directly under the abstract which underlines that zebra finches are able to make target-directed changes to song without the need for sensory feedback. This could be placed in the broader scope, in the context of how this relates to birdsong as a model mechanism as a whole. This implies that song (as a model for language) can be modulated in a goal directed manner without current sensory input. 

    2. Observation 

    I believe that the observations of the paper are mostly clear and relative to the points the authors make. Their logic is sound based on the evidence in each figure, and they order their experiments to narrow down variables very concisely. Confounding variables are discussed and even implemented in the data analysis with complementary figures to provide stronger validation of the authors' conclusions. Most of the data is easily readable and understandable, though there are a few figures which are field specific, such as the syllable fluorograms in figure 1. However, they are not superfluous because there is adequate detailing of differences in experimental groups in these figures and they are also elaborated upon quite strongly. There is extensive statistical analysis described which allows us to interpret the results as significant. 

    Figure 1: Recovery of pitch requires practice. This figure begins with an outline of three hypotheses for potential recovery pathways in 1A. Figure 1B displays timepoints for observation of recovery in early or late time points in muted versus control birds, and experimental outline. Figure 1C is an inclusion of potentially confounding data, i.e. spontaneous unmuting events. This is a necessary inclusion because they show their experimental errors did not affect the outcome of the experiment as a whole. 1D is a spectrogram, a visual representation of the song in different experimental groups. This may be difficult for the layman to understand and interpret but is commonplace in this field, so I don't think it's superfluous to add this information. 1E is an elaboration of the previous image, and I think the line chart associated with it (1E) is a great visualization of comparison of pitch in a single syllable in different groups. 1F is a further elaboration where we see syllable variance from baseline. 1H is a bit chaotic to read in my opinion, but the basic message is that control birds return to baseline more regularly than WNm birds either in early or late time stages. Perhaps an average of the samples would make it more reader friendly than having all the interesting data lines. 1I is a better representation of the data in 1H, showing early vs. late recovery, but needs a better key to distinguish the data sets.

    Figure 2: Recovery of pitch is impaired after deafening. 2A is again an outline of their experimental design, 2B displays the change in pitch during reinforcement and the subsequent inability to revert to baseline. 2C displays again a lack of recovery capability after deafening events without sensory feedback in this experiment, and 2D are violin plots of the same data. I would have included the control data for these experiments as a comparison, but the data only displays the WNd experimental group. 

    Figure 3: Deaf birds do not recover pitch target after light induced mismatch. Figure 3 is very similar in structure to figure 2, where there is an outline of the experimental design (3A), followed by visualizations of lack of recovery capability (3B-D). Again I would include these data compared to a control. 

    Figure 4: Target mismatch experience is necessary for revertive pitch change. 4A also begins with a layout of the experimental design. We then observe pitch changes in all the experimental groups (4B-D). 4E displays pitch changes in all groups in the early vs. late recovery period. I like this data because all groups are easily comparable on one plot. 4F (typo as 4C in the figure description) rules out time post-deafening as a confounding factor. I like that they implement this data directly in the article. 

    Figure 5: A visualization of the goal-directed planning pathway of directing vocal change. This is a very strong schematic that outlines the proposed outcome of the amalgamation of their data. 

    3. Interpretation 

    I believe the inferences are supported by adequate observation. There are also multiple visualization modalities for many of the data sets that are offered, making the point of each data set more understandable. We see that they started their experiments broad, observing that recovery of pitch requires practice in figure 1. They then narrow their variables to different modalities of altering pitch to observe whether recovery of song is due to sensory feedback or motor practice. My one question is why the birds in figure 4A were treated differently (i.e. WNd vs. dLO treatment groups) prior to being given either mismatch experience or no mismatch experience. I would have done this experiment in WNd birds and dLO birds separately, with each reinforcement group being given both mismatch and no mismatch experience. This way we can control which variable is giving us the revertive pitch change outcome, be it the reinforcement modality or the mismatch experience. I wonder if there is a possibility to give WNd birds either mismatch or no mismatch experience even though they are deafened at the same time, or give the dLO birds pretreatment with either mismatch or no mismatch experience prior to the deafening process to compare outcomes in the same treatment groups. 

    I like in their discussion section how they relate their findings to humans and propose models for comparing their findings here to future directions in human learning and speech. The authors are also extremely specific in the materials and methods section with time points and descriptive wording, leading me to believe that these results would be robust and reproducible.

    4. Clarity 

    I think the manuscript is very easy to read and has a very good flow. They start with a broad question and go on to narrow it variable by variable. They start with the question of how song is recovered, and then go on to isolate whether this is a result of sensory feedback or motor practice, with clear logic on the formulation of their hypotheses throughout. The figures integrate fantastically into the article by providing detailed descriptions of hypotheses explored in each experiment. They end the article with a schematic of their proposed mechanism of song recovery (Fig 5), that song recovery is the result of goal directed vocal change. There is one figure (Fig 1I) that I think could use a better key on which group is represented by which color, but we can infer this from the previous panel if anything, and I also think figure 1H might benefit from a little bit of data averaging to make it more readable, because the high intersection of lines and data points makes it a bit difficult to follow and interpret. Also, there is one typo in the labeling of the figures (4F labeled as 4C) but otherwise the paper was very clear, had logical flow, and provided convincing evidence to drive their claim.

    Competing interests

    The author declares that they have no competing interests.