Neural population dynamics of computing with synaptic modulations

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This is a valuable study showing that fast, but transient, modifications of the synaptic efficacies, alone, can support the integration of information over time. Convincing supportive evidence is provided by showing that feed-forward networks, when equipped with such short-term synaptic modulations, can successfully perform a variety of temporal integration tasks at a performance level comparable with that of recurrent networks. These results will be of interest to both neuroscientists and researchers in machine learning.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

In addition to long-timescale rewiring, synapses in the brain are subject to significant modulation that occurs at faster timescales that endow the brain with additional means of processing information. Despite this, models of the brain like recurrent neural networks (RNNs) often have their weights frozen after training, relying on an internal state stored in neuronal activity to hold task-relevant information. In this work, we study the computational potential and resulting dynamics of a network that relies solely on synapse modulation during inference to process task-relevant information, the multi-plasticity network (MPN). Since the MPN has no recurrent connections, this allows us to study the computational capabilities and dynamical behavior contributed by synapses modulations alone. The generality of the MPN allows for our results to apply to synaptic modulation mechanisms ranging from short-term synaptic plasticity (STSP) to slower modulations such as spike-time dependent plasticity (STDP). We thoroughly examine the neural population dynamics of the MPN trained on integration-based tasks and compare it to known RNN dynamics, finding the two to have fundamentally different attractor structure. We find said differences in dynamics allow the MPN to outperform its RNN counterparts on several neuroscience-relevant tests. Training the MPN across a battery of neuroscience tasks, we find its computational capabilities in such settings is comparable to networks that compute with recurrent connections. Altogether, we believe this work demonstrates the computational possibilities of computing with synaptic modulations and highlights important motifs of these computations so that they can be identified in brain-like systems.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    1. I was confused about the nature of the short-term plasticity mechanism being modeled. In the Introduction, the contrast drawn is between synaptic rewiring and various plasticity mechanisms at existing synapses, including long-term potentiation/depression, and shorter-term facilitation and depression. And the synaptic modulation mechanism introduced is modeled on STDP (which is a natural fit for an associative/Hebbian rule, especially given that short-term plasticity mechanisms are more often non-Hebbian).

    Indeed, because of its associative nature, the modulation mechanism was envisioned to be STDP-like, i.e. on faster time scales than the complete rewiring of the network (via backpropagation) but slower time scales than things like STSP which, as the reviewer points out, are usually not considered associative. One thing we do want to highlight is that backpropagation and the modulation mechanism are certainly not independent of one another. During training, the network’s weights that are being adjusted by backpropagation are experiencing modulations, and said modulations certainly factor into the gradient calculation.

    We have edited the abstract and introduction to try to make the distinction of what we are trying to model clearer.

    1. cont: On the other hand, in the network models the weights being altered by backpropagation are changes in strength (since the network layers are all-to-all), corresponding more closely to LTP/LTD. And in general, standard supervised artificial neural network training more closely resembles LTP/LTD than changing which neurons are connected to which (and even if there is rewiring, these networks primarily rely on persistent weight changes at existing synapses).

    Although we did not highlight this particular biological mechanism because we wanted to keep the updates as general as possible, one could view the early versus late LTP. We have added an additional discussion of how the associative modulation mechanisms and backpropagation might biologically map into this mechanism in the discussion section.

    1. cont: Moreover, given the timescales of typical systems neuroscience tasks with input coming in on the 100s of ms timescale, the need for multiple repetitions to induce long-term plasticity, and the transient nature/short decay times of the synaptic modulations in the SM matrix, the SM matrix seems to be changing on a timescale faster than LTP/LTD and closer to STP mechanisms like facilitation/depression. So it was not clear to me what mechanism this was supposed to correspond to.

    We note that although the structure of the tasks certainly resembles known neuroscience experiments that happen on shorter time scales (and with the introduction of the 19 new NeuroGym tasks, even more so), we did not have a particular time scale for task effects in mind. So each piece of “evidence” in the integration tasks may indeed occur over significantly slower time scales and could abstractly represent multiple repetitions in order to induce (say) early phase LTP.

    Given that the separation between the two plasticity mechanisms may be clearer for STSP, and indeed many of the tasks we investigate may more naturally be mapped to tasks that occur on time scales more relevant to STSP, we have introduced a second modulation rule that is only dependent upon the presynaptic firing rates. See our response to the Essential Revisions above for additional details on these new results.

    1. A number of studies have explored using short-term plasticity mechanisms to store information over time and have found that these mechanisms are useful for general information integration over time. While many of these are briefly cited, I think they need to be further discussed and the current work situated in the context of these prior studies. In particular, it was not clear to me when and how the authors' assumptions differed from those in previous studies, which specific conclusions were novel to this study, and which conclusions are true for this specific mechanism as opposed to being generally true when using STP mechanisms for integration tasks.

    We have added additional works to the related works sections and expanded the introduction to try to better convey the differences with our work and previous studies. Briefly, mostly our assumptions differed from previous studies in that we considered a network that relied only on synaptic modulations to do computations, rather than a network with both recurrence and synaptic modulations. This allowed us to isolate the computational power and behavior of computing using synaptic modulations alone.

    It is hard to say which of the conclusions are generally true when using STP mechanisms for integration tasks without a comprehensive comparison of the various models of STP on the same tasks we investigated here. That being said, we believe we have presented in this work conclusions that are not present in other works (as far as we are aware) including: (1) a demonstration of the strength of computing with synaptic connection on a large variety of sequential tasks, (2) an investigation into the dynamics of such computations how they might manifest in neuronal recordings, and (3) a brief look at how these different dynamics might be computational beneficial in neuroscience-relevant areas. We also note that one reason for the simplicity of our mechanism is that we believe it captures many effects of synaptic modulations (e.g. gradual increase/decrease of synaptic strength that eventually saturates) with a relatively simple expression, and so we believe other STP mechanisms would yield qualitatively similar results. We have edited the text to try to clarify when conclusions are novel to this study and when we are referencing results from other works.

    Reviewer #2 (Public Review):

    On the other hand, the general principle appears (perhaps naively) very general: any stimulus-dependent, sufficiently long-lived change in neuronal/synaptic properties is a potential memory buffer. For instance, one might wonder whether some non-associative form of synaptic plasticity (unlike the Hebbian-like form studied in the paper), such as short-term synaptic plasticity which depends only on the pre-synaptic activity (and is better motivated experimentally), would be equally effective. Or, for that matter, one might wonder whether just neuronal adaptation, in the hidden layer, for instance, would be sufficient. In this sense, a weakness of this work is that there is little attempt at understanding when and how the proposed mechanism fails.

    We have tried to address if the simplicity of the tasks considered in this work may be a reason for the MPN’s success by training it on 19 additional neuroscience tasks (see response to Essential Revisions above). Across all these additional tasks, we found the MPN performs comparable to its RNN counterparts.

    To address whether associativity is necessary in our setup we have introduced a version of the MPN that has modulation updates that are only presynaptic dependent. We call this the “MPNpre” and have added several results across the paper addressing its computational abilities (again, additional details are provided above in Essential Revisions). We find the MPNpre has dynamics that are qualitatively the same as its MPN counterpart and has very comparable computational capabilities.

    Certainly, some of the tasks we consider may also be solvable by introducing other forms of computation such as neuronal adaptation. Indeed, we believe the ability of the brain to solve tasks in so many different ways is one of the things that makes it so difficult to study. Our work here has attempted to highlight one particular way of doing computations (via synapse dynamics) and compared it to one particular other form (recurrent connections). Extending this work to even more forms of computation, including neuronal dynamics, would be very interesting and further help distinguish these different computational methods from one another.

    Reviewer #3 (Public Review):

    Because the MPN is essentially a low-pass filter of the activity, and the activity is the input - it seems that integration is almost automatically satisfied by the dynamics. Are these networks able to perform non-integration tasks? Decision-making (which involves saddle points), for instance, is often studied with RNNs.

    We have tested the MPN on 19 additional supervised learning tasks found in the NeuroGym package (Molano-Mazon et. al., 2022), which consists of several decision-making-based tasks and added these results to the main text (see response to Essential Revisions above, and also Figs. 7i & 7j). Across all tasks we investigated, we found the MPN performs at comparable levels to its RNN counterparts.

    Manuel Molano-Mazon, Joao Barbosa, Jordi Pastor-Ciurana, Marta Fradera, Ru-Yuan Zhang, Jeremy Forest, Jorge del Pozo Lerida, Li Ji-An, Christopher J Cueva, Jaime de la Rocha, et al. “NeuroGym: An open resource for developing and sharing neuroscience tasks”. (2022).

    The current work has some resemblance to reservoir computing models. Because the M matrix decays to zero eventually, this is reminiscent of the fading memory property of reservoir models. Specifically, the dynamic variables encode a decaying memory of the input, and - given large enough networks - almost any function of the input can be simply read out. Within this context, there were works that studied how introducing different time scales changes performance (e.g., Schrauwen et al 2007).

    Thank you for pointing out this resemblance and work. In our setup, the fact that lamba is the same for the entire network means all elements of M decrease uniformly (though the learned modulation updates may allow for the growth of M to be non-uniform). One modification that we think would be very interesting to explore is the effects on the dynamics of non-uniform learning rates or decays across synapses. In this setting, the M matrix could have significantly different time scales and may even further resemble reservoir computing setups. We have added a sentence to the discussion section discussing this possibility.

    Another point is the interaction of the proposed plasticity rule with hidden-unit dynamics. What will happen for RNNs with these plasticity rules? I see why introducing short-term plasticity in a "clean" setting can help understand it, but it would be nice to see that nothing breaks when moving to a complete setting. Here, too, there are existing works that tackle this issue (e.g., Orhan & Ma, Ballintyn et al, Rodriguez et al).

    Thank you for pointing out these additional works, they are indeed very relevant and we have added them all to the text where relevant.

    Here we believe we have shown that either recurrent connections or synaptic dynamics alone can be used to solve a wide variety of neuroscience tasks. We don’t believe a hybrid setting with both synaptic dynamics and recurrence (e.g. a Vanilla RNN with synaptic dynamics) would “break” any part of this setup. Since each of the computational mechanisms could be learned to be suppressed the network could simply solve the task by relying on only one of the two mechanisms. For example, it could use a strictly non-synaptic solution by driving eta (the learning rate of the modulations) to zero or it could use a non-recurrent solution by driving the influence of recurrent connections to be very small. Orhan & Ma mention they have a hard time training a Vanilla RNN with Hebbian modulations on the recurrent weights for any modulation effect that goes back more than one time step, but unlike our work they rely on a fixed modulation strength.

    Indeed, we think how networks with multiple computational mechanisms will solve tasks is a very interesting question to be further investigated, and a hybrid solution may be likely. We believe our work is valuable in that it illuminates one end of the spectrum that is relatively unexplored: how such tasks could be solved using just synaptic dynamics. However, what type of solution a complete setup ultimately lands on is likely largely dependent upon both the initialization and the training procedure, so we felt exploring the dynamics of such networks was outside the scope of this work.

    One point regarding biological plausibility - although the model is abstract, the fact that the MPN increases without bounds are hard to reconcile with physical processes.

    Note although the MPN expression does not have explicit bounds, in practice the exponential decay eventually does balance with the SM matrix updates, and so we observe a saturation in its size (Fig. 4c, except for the case of lamba=1.0, which is not considered elsewhere in the text). However, we explicitly added modulation bounds to the M matrix update expression and did not find it significantly changed the results (see comments on Essential Revisions above for details).

  2. eLife assessment

    This is a valuable study showing that fast, but transient, modifications of the synaptic efficacies, alone, can support the integration of information over time. Convincing supportive evidence is provided by showing that feed-forward networks, when equipped with such short-term synaptic modulations, can successfully perform a variety of temporal integration tasks at a performance level comparable with that of recurrent networks. These results will be of interest to both neuroscientists and researchers in machine learning.

  3. Reviewer #1 (Public Review):

    Synapses are modulated by neural activity on a variety of timescales. Typical neural network models primarily consider long-lasting changes to synaptic strengths, applied while the network is learning, with synaptic strengths then being fixed after learning. However, shorter-term plasticity mechanisms are ubiquitous in the brain and have been shown to have significant computational and information-storage capabilities. Here the authors study these mechanisms in the context of the integration of information tasks. Their two primary contributions are to analyze these short-term mechanisms separately from recurrent connections to isolate the specific ways these might be useful and to apply ideas from population data analysis to dissect how their networks solve the tasks.

    I thought this was a clear, well-written, and well-organized paper, tackling an important problem. I also found that the conclusions were adequately supported by the simulations and analyses shown. I particularly appreciated the careful analysis of how the different networks solved the task and found the distinction between hidden neurons reflecting accumulated evidence (attractor architecture) vs. reflecting inputs (MPN architecture) very interesting and potentially very useful for thinking about experimental observations. My comments are primarily about the connection to biology/biological interpretability as well as how this study relates to prior work.

    1. I was confused about the nature of the short-term plasticity mechanism being modeled. In the Introduction, the contrast drawn is between synaptic rewiring and various plasticity mechanisms at existing synapses, including long-term potentiation/depression, and shorter-term facilitation and depression. And the synaptic modulation mechanism introduced is modeled on STDP (which is a natural fit for an associative/Hebbian rule, especially given that short-term plasticity mechanisms are more often non-Hebbian). On the other hand, in the network models the weights being altered by backpropagation are changes in strength (since the network layers are all-to-all), corresponding more closely to LTP/LTD. And in general, standard supervised artificial neural network training more closely resembles LTP/LTD than changing which neurons are connected to which (and even if there is rewiring, these networks primarily rely on persistent weight changes at existing synapses). Moreover, given the timescales of typical systems neuroscience tasks with input coming in on the 100s of ms timescale, the need for multiple repetitions to induce long-term plasticity, and the transient nature/short decay times of the synaptic modulations in the SM matrix, the SM matrix seems to be changing on a timescale faster than LTP/LTD and closer to STP mechanisms like facilitation/depression. So it was not clear to me what mechanism this was supposed to correspond to.

    2. A number of studies have explored using short-term plasticity mechanisms to store information over time and have found that these mechanisms are useful for general information integration over time. While many of these are briefly cited, I think they need to be further discussed and the current work situated in the context of these prior studies. In particular, it was not clear to me when and how the authors' assumptions differed from those in previous studies, which specific conclusions were novel to this study, and which conclusions are true for this specific mechanism as opposed to being generally true when using STP mechanisms for integration tasks.

  4. Reviewer #2 (Public Review):

    Most neuronal computations require keeping track of the inputs over temporal windows that exceed the typical time scales of single neurons. A standard and relatively well-understood way of obtaining time scales longer than those of the "microscopic" elements (here, the single neurons) is to have appropriate recurrent synaptic connectivity. Another possibility is to have a transient, input-dependent modulation of some neuronal and/or synaptic properties, with the appropriate time scale. Indeed, there is ample experimental evidence that both neurons and synapses modify their dynamics on multiple time scales, depending on the previous history of activation. There is, however, little understanding of the computational implications of these modifications, in particular for short-term memory.

    Here, the authors have investigated the suitability of a class of transient synaptic modulations for storing and processing information over short-time scales. They use a purely feed-forward network architecture so that "synaptic modulation" is the only mechanism available for temporarily storing the information. The network is called Multi-Plasticity Network (MPN), in reference to the fact that the synaptic connectivity being transiently modulated is adjusted via standard supervised learning. They find that, in a series of integration-based tasks of varying difficulty, the MPN exhibits performances that are comparable with those of (trained) recurrent neuronal networks (RNNs). Interestingly, the MPN consistently outperforms the RNNs when only the read-out is being learned, that is in a minimal-training condition.

    The conclusions of the paper are convincingly supported by the careful numerical experiments and the analysis performed by the authors, mostly to compare the performances of the MPN against various RNN architectures. The results are intriguing from a "classic" neuroscience perspective, providing a computational point of view to rationalize the various synaptic dynamics observed experimentally on largely different time scales, and are of certain interest to the machine learning community.

    On the other hand, the general principle appears (perhaps naively) very general: any stimulus-dependent, sufficiently long-lived change in neuronal/synaptic properties is a potential memory buffer. For instance, one might wonder whether some non-associative form of synaptic plasticity (unlike the Hebbian-like form studied in the paper), such as short-term synaptic plasticity which depends only on the pre-synaptic activity (and is better motivated experimentally), would be equally effective. Or, for that matter, one might wonder whether just neuronal adaptation, in the hidden layer, for instance, would be sufficient. In this sense, a weakness of this work is that there is little attempt at understanding when and how the proposed mechanism fails.

  5. Reviewer #3 (Public Review):

    The authors study the performance, generalization, and dynamics of artificial neural networks trained on integration tasks. These types of tasks were studied theoretically in the past, and comparisons have also been made between artificial and biological networks. The authors focus on the effect of short-term plasticity on the networks. This is modeled as a multiplicative modulation of synaptic strengths that decays over time. When not decaying, this modulation is driven by Hebbian (or anti-Hebbian) activity-dependent terms. To isolate the effects of this component of the networks, the authors study a feedforward architecture, thereby rendering the synaptic modulations the only dynamical variables in the system. The authors also compare their network (MPN) with RNNs (gated and vanilla).

    Perhaps not surprisingly, the information on the integration task is encoded in the dynamic variables of the networks - which are hidden units for RNNs and synaptic modulations for MPNs. The authors also study the dynamics of MPNs in the presence of noise or longer-than-trained input sequences. Finally, context-dependent integration is also studied.
    Biological neurons are far more complex than their artificial counterparts. This implies that there are computations that can be "outsourced" to these complexities, instead of being handled by a vanilla-rnn-like network that only has connectivity and hidden states. Given the recent rise in applications of trained RNNs as models of biological systems, it is thus timely to ask what are the consequences of integrating some of these complexities. The current study falls under this broad question, with a focus on short-term synaptic plasticity.
    I am worried, however, by two issues: the relation between integration tasks and the plasticity mechanism introduced, and the relation to existing work.

    Because the MPN is essentially a low-pass filter of the activity, and the activity is the input - it seems that integration is almost automatically satisfied by the dynamics. Are these networks able to perform non-integration tasks? Decision-making (which involves saddle points), for instance, is often studied with RNNs.

    The current work has some resemblance to reservoir computing models. Because the M matrix decays to zero eventually, this is reminiscent of the fading memory property of reservoir models. Specifically, the dynamic variables encode a decaying memory of the input, and - given large enough networks - almost any function of the input can be simply read out. Within this context, there were works that studied how introducing different time scales changes performance (e.g., Schrauwen et al 2007).

    Another point is the interaction of the proposed plasticity rule with hidden-unit dynamics. What will happen for RNNs with these plasticity rules? I see why introducing short-term plasticity in a "clean" setting can help understand it, but it would be nice to see that nothing breaks when moving to a complete setting. Here, too, there are existing works that tackle this issue (e.g., Orhan & Ma, Ballintyn et al, Rodriguez et al).

    One point regarding biological plausibility - although the model is abstract, the fact that the MPN increases without bounds are hard to reconcile with physical processes.
    To summarize, the authors show that plastic synapses can perform integration tasks in a manner that is dynamically distinct from RNNs - thereby strengthening the argument to include such synapses in models. This can be of interest to researchers interested in biologically plausible models of neural circuits.

    Schrauwen, Benjamin, Jeroen Defour, David Verstraeten, and Jan Van Campenhout. "The Introduction of Time-Scales in Reservoir Computing, Applied to Isolated Digits Recognition." In Artificial Neural Networks - ICANN 2007, edited by Joaquim Marques de Sá, Luís A. Alexandre, Włodzisław Duch, and Danilo Mandic, 471-79. Lecture Notes in Computer Science 4668. Springer Berlin Heidelberg, 2007. http://link.springer.com/chapter/10.1007/978-3-540-74690-4_48.

    Orhan, A. Emin, and Wei Ji Ma. "A Diverse Range of Factors Affect the Nature of Neural Representations Underlying Short-Term Memory." Nature Neuroscience 22, no. 2 (February 2019): 275-83. https://doi.org/10.1038/s41593-018-0314-y.

    Ballintyn, B., Shlaer, B. & Miller, P. Spatiotemporal discrimination in attractor networks with short-term synaptic plasticity. J Comput Neurosci 46, 279-297 (2019). https://doi.org/10.1007/s10827-019-00717-5

    Rodriguez, H.G., Guo, Q. & Moraitis, T.. (2022). Short-Term Plasticity Neurons Learning to Learn and Forget. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:18704-18722 Available from https://proceedings.mlr.press/v162/rodriguez22b.html.