Cortical adaptation to sound reverberation

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This paper identifies a new adaptation phenomenon in the cortical representation of sound that could explain invariance of auditory perception to reverberations of sounds in the environment.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

In almost every natural environment, sounds are reflected by nearby objects, producing many delayed and distorted copies of the original sound, known as reverberation. Our brains usually cope well with reverberation, allowing us to recognize sound sources regardless of their environments. In contrast, reverberation can cause severe difficulties for speech recognition algorithms and hearing-impaired people. The present study examines how the auditory system copes with reverberation. We trained a linear model to recover a rich set of natural, anechoic sounds from their simulated reverberant counterparts. The model neurons achieved this by extending the inhibitory component of their receptive filters for more reverberant spaces, and did so in a frequency-dependent manner. These predicted effects were observed in the responses of auditory cortical neurons of ferrets in the same simulated reverberant environments. Together, these results suggest that auditory cortical neurons adapt to reverberation by adjusting their filtering properties in a manner consistent with dereverberation.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    The authors showed that longer reverberation time prolongs inhibitory receptive fields in cortex and suggest that this helps producing sound representations that are more stable to reverberation effects. The claims is qualitatively well supported by two controls based on probe responses to the same type of white noise in two different reverberation contexts and based on receptive fields measured at different time points after the switch between two reverberation conditions. The latter gives stronger results and thus constitutes a more convincing control that the longer decay of inhibition is not an artifact of stimulus statistics. The limits of the study include the use of anesthesia and the fact that cortex shows a very broad range of dereverberation effects, actually much broader than predicted by a simple model. This result confirms that reverberation produces cortical adaptation as suggested before, and suggests as a mechanistic hypothesis that rapid plasticity of inhibition underlies this adaptation. However the paper does not address whether this adaptation occurs in cortex or in subcortical structures. The fact that an effect is observed under anesthesia suggests a subcortical origin.

    We agree that it is important to consider subcortical processing levels too, as we have done previously when investigating neuronal adaptation to mean sound level and contrast. However, these and other forms of adaptation are known to be organized hierarchically and are most prominent in the auditory cortex. In particular, in ferrets, the species we use in our study, contrast adaptation is a weaker and less consistent property of neurons in the inferior colliculus than of neurons in the primary auditory cortex (Rabinowitz et al., 2013, PLOS Biol. 11:e1001710). Similar results have been obtained for stimulus-specific adaptation and prediction error signaling in other species (Parras et al. 2017, Nature Comms. 8:2148; Harpaz et al., 2021, Prog. Neurobiol. 202:102049). It therefore makes considerable sense to focus here on the primary auditory cortical areas in ferrets, where adaptation to reverberation has been demonstrated before (Mesgarani et al., 2014, PNAS 111:6792-7), in order to explore the possible basis for this effect. We agree that future work should investigate whether adaptive shifts in the inhibitory components of the receptive fields with room size are a property of the cortex only or also found in subcortical auditory areas, such as the thalamus or midbrain.

    We chose to record from anesthetized ferrets in order to provide the stability required for presenting the long stimulus sequences that were essential for characterizing the effects of reverberation on the responses of cortical neurons. This strategy was adopted only because we have previously shown that contrast adaptation is indistinguishable in the primary auditory cortex of awake and anesthetized ferrets (Rabinowitz et al., 2011, Neuron 70:1178-91). Furthermore, adaptation to background noise has been shown to enhance the representation of speech in the human auditory cortex independently of the attentional focus of the listeners (Khalighinejad et al., 2019, Nature Comms. 10:2509). All the same, while there is much evidence to indicate that adaptation does not differ, at least qualitatively, with brain state, it would be interesting in future research to determine how task engagement affects the inhibitory plasticity that we observed in this study.

    Reviewer #2 (Public Review):

    Ivanov et al. examined how auditory representations may become invariant to reverberation. They illustrate the spectrotemporal smearing caused by reverberation and explain how dereverberation may be achieved through neural tuning properties that adapt to reverberation times. In particular, inhibitory responses are expected to be more delayed for longer reverberation times. Consistently, inhibition should occur earlier for higher frequencies where reverberation times are naturally shorter. In the manuscript, these two dependent relationships were derived not directly from acoustic signals but from estimated relationships between reverberant and anechoic signal representations after introducing some basic nonlinearity of the auditory periphery. They found consistent patterns in the tuning properties of auditory cortical neurons recorded from anesthetized ferrets. The authors conclude that auditory cortical neurons adapt to reverberation by adjusting the delay of neural inhibition in a frequency-specific manner and consistent with the goal of dereverberation.

    Strengths:

    This main conclusion is supported by the data. The dynamic nature of the observed changes in neural tuning properties are demonstrated mainly for naturalistic sounds presented in persistent virtual auditory spaces. The use of naturalistic sounds supports the generalization of their findings to real live scenarios. In addition, three control investigations were conducted to backup their conclusions: they investigated the build-up of the adaptation effect in a paradigm switching the reverberation time after every 8 seconds; they analyzed to which degree the observed changes in tuning properties may result from differences in the stimulus sets and unknown nonlinearities; and, most convincingly, they demonstrated after-effects on anechoic probes.

    Thank you.

    Weaknesses:

    1. The strength of neural adaptation appears overestimated in the main body of the text. The effect sizes obtained in control conditions with physically identical stimuli (anechoic probes, Fig. 3-Supp. 3B; build-up after switching, Fig. 3-Supp. 4B-C) are considerably smaller than the ones obtained for the main analyses with physically different stimuli. In fact, the effect sizes for the control conditions are similar to those attributed to the physical differences alone (Fig. 3-Supp. 2B).

    The best estimates of the magnitude of the neural adaptation in our paper come from the STRF analysis, and the potential effects of stimulus differences is estimated using our simulated neurons method. While the noise burst and room switching experiments are very valuable controls for verifying the presence of the adaptation, they may underestimate the adaptation’s magnitude because the responses to the anechoic noise burst probes may become partially unadapted during their progress, lessening the adapted effects for these sounds. Likewise, the room switching control may not capture the full magnitude of the adaptive effect because the time spans of two time windows used to assess the adaptation (i.e. L1 and L2 or S1 and S2) have limited resolution and may not be optimally matched to the timescourse of the adaptation. However, the noise burst and room switching analyses are critical controls in our study, even if the measured effects may be more subtle. Crucially, these analyses demonstrate that the reverberation adaptation can be observed even for physically identical stimuli. This confirms, in addition to our simulated neuron methods, that the effects described in our manuscript cannot be entirely due to fitting artifacts resulting from comparing neural responses to different acoustic stimuli, but rather result, at least in part, from an underlying adaptive process.

    1. All but one analysis depends on so-called cochleagrams that very roughly approximate the spectrotemporal transfer characteristics of the auditory periphery. Basically, logarithmic power values of a time-frequency transformation with a linear frequency scale are grouped into logarithmically spaced frequency bins. This choice of auditory signal representation appears suboptimal in various contexts:

    On the one hand, for the predictions generated from the proposed "normative model" (linear convolution kernels linking anechoic with reverberant cochleagrams), the non-linearity introduced by the cochleagrams are not necessary. The same predictions can be derived from purely acoustical analyses of the binaural room impulse responses (BRIRs). Perfect dereverberation of a binaural acoustic signal is achieved by deconvolution with the BRIR (first impulse of the BRIR may be removed before deconvolution in order to maintain the direct path). On the other hand, the estimation of neural tuning properties (denoted as spectro-temporal receptive fields, STRFs) assumes a linear relationship between the cochleagram and the firing rates of cortical neurons. However, there are well-described nonlinearities and adaptation mechanisms taking place even up to the level of the auditory nerve. Not accounting for those effects likely impedes the STRF fits and makes all subsequent analyses less reliable. I trust the small but consistent effect observed for the anechoic probes (Fig. 3-Supp. 3B) the most because it does not rely on STRF fits. Finally, the simplistic nature of the cochleagram is not able to partial out the contribution of peripheral adaptation from the adaptation observed at cortical sites.

    The reviewer brings up two important issues to consider here. The first is our use of cochleagrams to model peripheral input to the auditory cortex. The second is our use of STRFs to model the receptive fields of auditory cortical neurons.

    In a recent study (Rahman et al., 2020, PNAS 117:28442-51), we tested a wide range of cochlear models to examine which model provides the best preprocessing stage for predicting neural responses to natural sounds in the ferret primary auditory cortex. We found that the cochlear models used to produce cochleagrams in the current manuscript performed best, outperforming even more complicated and biologically-inspired cochlear models (e.g. Bruce et al., 2018, Hearing Research 360:40-54). This therefore determined our choice of cochlear model. However, to address the reviewer’s concern, we replicated our reverberation adaptation findings using Bruce et al.’s (2018) more complex cochlear model, and we include the results of this analysis in our revised version of the manuscript.

    STRFs are widely used to model the receptive fields of neurons in the auditory system, and particularly in the primary auditory cortex. Nevertheless, the reviewer is correct to point out that these linear models of neural receptive fields are limited, and many cortical neurons show nonlinear aspects in their frequency and temporal tuning. In the present study, the use of STRFs in the normative deverberation model allowed us to produce predictions for neural tuning across reverberant conditions that could be directly tested in the STRFs of real cortical neurons. It is less clear to us how an acoustical analysis of BRIRs would translate into predicted neural firing patterns. While the simple STRF model used here provided new insights into a mechanism for reverberation adaptation in the auditory cortex, it would be interesting and valuable for future studies to test non-linear receptive field properties in this context. Future studies should also examine contributions to reverberation adaptation at other levels of the auditory system, including subcortical stations.

  2. Evaluation Summary:

    This paper identifies a new adaptation phenomenon in the cortical representation of sound that could explain invariance of auditory perception to reverberations of sounds in the environment.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    The authors showed that longer reverberation time prolongs inhibitory receptive fields in cortex and suggest that this helps producing sound representations that are more stable to reverberation effects. The claims is qualitatively well supported by two controls based on probe responses to the same type of white noise in two different reverberation contexts and based on receptive fields measured at different time points after the switch between two reverberation conditions. The latter gives stronger results and thus constitutes a more convincing control that the longer decay of inhibition is not an artefact of stimulus statistics. The limits of the study include the use of anesthesia and the fact that cortex shows a very broad range of dereverberation effects, actually much broader than predicted by a simple model. This result confirms that reverberation produces cortical adaptation as suggested before, and suggests as a mechanistic hypothesis that rapid plasticity of inhibition underlies this adaptation. However the paper does not address whether this adaptation occurs in cortex or in subcortical structures. The fact that an effect is observed under anesthesia suggests a subcortical origin.

  4. Reviewer #2 (Public Review):

    Ivanov et al. examined how auditory representations may become invariant to reverberation. They illustrate the spectrotemporal smearing caused by reverberation and explain how dereverberation may be achieved through neural tuning properties that adapt to reverberation times. In particular, inhibitory responses are expected to be more delayed for longer reverberation times. Consistently, inhibition should occur earlier for higher frequencies where reverberation times are naturally shorter. In the manuscript, these two dependent relationships were derived not directly from acoustic signals but from estimated relationships between reverberant and anechoic signal representations after introducing some basic nonlinearity of the auditory periphery. They found consistent patterns in the tuning properties of auditory cortical neurons recorded from anesthetized ferrets. The authors conclude that auditory cortical neurons adapt to reverberation by adjusting the delay of neural inhibition in a frequency-specific manner and consistent with the goal of dereverberation.

    Strengths:
    This main conclusion is supported by the data. The dynamic nature of the observed changes in neural tuning properties are demonstrated mainly for naturalistic sounds presented in persistent virtual auditory spaces. The use of naturalistic sounds supports the generalization of their findings to real live scenarios. In addition, three control investigations were conducted to backup their conclusions: they investigated the build-up of the adaptation effect in a paradigm switching the reverberation time after every 8 seconds; they analyzed to which degree the observed changes in tuning properties may result from differences in the stimulus sets and unknown non-linearities; and, most convincingly, they demonstrated after-effects on anechoic probes.

    Weaknesses:

    1. The strength of neural adaptation appears overestimated in the main body of the text. The effect sizes obtained in control conditions with physically identical stimuli (anechoic probes, Fig. 3-Supp. 3B; build-up after switching, Fig. 3-Supp. 4B-C) are considerably smaller than the ones obtained for the main analyses with physically different stimuli. In fact, the effect sizes for the control conditions are similar to those attributed to the physical differences alone (Fig. 3-Supp. 2B).
    2. All but one analysis depends on so-called cochleagrams that very roughly approximate the spectrotemporal transfer characteristics of the auditory periphery. Basically, logarithmic power values of a time-frequency transformation with a linear frequency scale are grouped into logarithmically spaced frequency bins. This choice of auditory signal representation appears suboptimal in various contexts:
      On the one hand, for the predictions generated from the proposed "normative model" (linear convolution kernels linking anechoic with reverberant cochleagrams), the non-linearity introduced by the cochleagrams are not necessary. The same predictions can be derived from purely acoustical analyses of the binaural room impulse responses (BRIRs). Perfect dereverberation of a binaural acoustic signal is achieved by deconvolution with the BRIR (first impulse of the BRIR may be removed before deconvolution in order to maintain the direct path).
      On the other hand, the estimation of neural tuning properties (denoted as spectro-temporal receptive fields, STRFs) assumes a linear relationship between the cochleagram and the firing rates of cortical neurons. However, there are well-described nonlinearities and adaptation mechanisms taking place even up to the level of the auditory nerve. Not accounting for those effects likely impedes the STRF fits and makes all subsequent analyses less reliable. I trust the small but consistent effect observed for the anechoic probes (Fig. 3-Supp. 3B) the most because it does not rely on STRF fits.
      Finally, the simplistic nature of the cochleagram is not able to partial out the contribution of peripheral adaptation from the adaptation observed at cortical sites.
  5. Reviewer #3 (Public Review):

    The paper by Ivanov et. al. examines how the auditory system adapts in reverberant acoustic conditions. Using a linear dereverberation framework, the study tests whether the tuning properties of neurons change in a similar manner to what is predicted by a linear dereverberation filter. The study shows that dereverberation is achieved by an extension of the inhibitory regions of receptive fields in a frequency-dependent manner. Notably, this result is complemented by showing a change in the cortical responses to probe sounds presented in the context of different reverberant conditions. Together, the similarity of the computational predictions and experimental findings supports an adaptive cortical mechanism that can reduce the effect of reverberation and in turn, support noise robust auditory perception.