A model of hippocampal replay driven by experience and environmental structure facilitates spatial learning

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This paper proposes a new computational model for replay that is biologically realistic and accounts for a number of important phenomena in hippocampal replay. This is an important study with implications for multiple subfields. Whilst the majority of claims are convincingly supported by the data, simulation analyses for some crucial aspects of replay literature are currently incomplete.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Replay of neuronal sequences in the hippocampus during resting states and sleep play an important role in learning and memory consolidation. Consistent with these functions, replay sequences have been shown to obey current spatial constraints. Nevertheless, replay does not necessarily reflect previous behavior and can construct never-experienced sequences. Here, we propose a stochastic replay mechanism that prioritizes experiences based on three variables: 1. Experience strength, 2. experience similarity, and 3. inhibition of return. Using this prioritized replay mechanism to train reinforcement learning agents leads to far better performance than using random replay. Its performance is close to the state-of-the-art, but computationally intensive, algorithm by Mattar & Daw (2018). Importantly, our model reproduces diverse types of replay because of the stochasticity of the replay mechanism and experience-dependent differences between the three variables. In conclusion, a unified replay mechanism generates diverse replay statistics and is efficient in driving spatial learning.

Article activity feed

  1. eLife assessment

    This paper proposes a new computational model for replay that is biologically realistic and accounts for a number of important phenomena in hippocampal replay. This is an important study with implications for multiple subfields. Whilst the majority of claims are convincingly supported by the data, simulation analyses for some crucial aspects of replay literature are currently incomplete.

  2. Reviewer #1 (Public Review):

    In this work, Diekmann and Cheng have proposed a new computational model for hippocampal replay. The new model is based on the linear RL work by Piray and Daw 2021, and addresses a fundamental problem in the seminal replay model of Mattar and Daw 2018 (M&D). The new model is based on the default representation, which is a realistic account for state closeness in model-based RL.

    This study addresses an important problem in neuroscience at the computational level. The proposed theory is a significant normative computational model that captures important aspects of experimental data in the replay literature. The paper is very well-written (a difficult task for a pure computational work) and figures illustrate the main concepts very well. I have only one question/suggestion:

    I believe that there is important data in the literature that cannot be explained by the current model, especially regarding representation of the goal. That is fine; no model is complete, but it is important that authors discuss those caveats in the discussion.

  3. Reviewer #2 (Public Review):

    In their paper, Diekmann and Cheng describe a model for the generation of so-called hippocampal replay sequences - a process thought to play a central role in planning, decision making and the consolidation of new memories. Given the diversity of functions replay has been purported to support coming up with a single mechanism for it has remained a challenge. Diekmann and Cheng are able to achieve this with a relatively simple and intuitive model. Specifically, in their model replay is determined based on a finite number of factors; namely, the likelihood and reward-association of an experience, how similar an experience is with an agent's/animal's current state and whether an experience matches *too* much the current state (so to avoid replaying persistently the same state). With these few ingredients the authors are able to replicate important replay findings. Further, the authors emphasise that their model has the significant advantage of being more biologically feasible than other contemporary models in the field.

    The model achieves its objectives broadly however the authors have not sufficiently explained the advantage of their model over other models - i.e. how they address the limitations of previous models - nor have they attempted to replicate multiple important features of replay - such as that it can often be non-local. Finally, the details of the biological implementation of their model, particularly with regard to the two modes it can operate in, have not been fleshed out. These points limit the potential impact of the model.

  4. Reviewer #3 (Public Review):

    This manuscript provides a remarkably simple, yet effective, model of hippocampal replay. A replay event is stitched together as a chain of reactivated experiences. Individual experiences are prioritized for reactivation according to three intuitive measures: the spatial proximity of an experience to that previously reactivated, the frequency of and reward associated with an experience, and an inhibitory term that propagates the replay across space. Under certain conditions, their model can produce replays that are nearly as optimal--in terms of teaching a reinforcement learning agent to successfully navigate to a reward--as those produced by Mattar and Daw's 2018 model which, by design, generates the most behaviorally useful replays.

    The authors assert that their model can recapitulate the replay statistics observed in a subset of experimental works, including the ability of replay to generate novel 'short cuts' from segments of past experience, the resemblance of replay to Brownian diffusion following random exploration, the ability of replay to steer around environmental barriers, and the observation of pre-play. These claims are generally well supported by the data presented (in particular, the model seems to be quite robust to different parameters).

    One important caveat is that the proposed model requires two modes ('default' and 'reverse') to simultaneously account for empirical findings and provide behavioral utility (the performance of the agent is poor when using the default mode, but comparable with that of Mattar and Daw in the reverse mode). The authors suggest that the brain could dynamically switch between modes (dubbed the 'dynamic' mode). I feel that the paper would be strengthened by focusing on this dynamic mode throughout and demonstrating that it produces replays with statistics matching empirical data. For example, what is the distribution of forward and reverse replays produced by the default model (figure 3D)? Since neither mode by itself is adequately consistent with experimental findings, showing that the model appropriately switches between modes would strengthen its plausibility.

    The authors state that their model is able to recapitulate the finding that replay in sleep following random exploration can be described by Brownian diffusion. A key point in that paper was that the preceding behavior was not diffusive. The authors go some way to address this point by showing that their model produces diffusive replays even if the strength of experience across space is not uniform. However, it isn't clear to me that modeling non-uniform experience strength is equivalent to modeling non-diffusive behaviorally trajectories. A more convincing test would have been to simulate realistic behavioral trajectories and show that subsequent replay events are still diffusive.

    In my view, the fact that the model can generate 'pre-play' (in this case, replay of a visually cued, but unvisited arm of the maze) is not particularly informative. In order to generate pre-play, the authors allow the agent to 'visually explore' the cued arm. The implementation of this visual exploration is equivalent to allowing the agent a limited amount of real physical experience on the cued arm. Thus, the finding of replay for the cued arm is unsurprising. It would have been more useful to show that the model over-represents the rewarded arm on a T-maze, given equal exploration of the arms (as in Mattar and Daw).

    Also debatable is the authors' assertion that their model is biologically plausible, while that of Mattar and Daw is not. While the former model is certainly computationally less expensive, little experimental data exists that could definitively point to the biological plausibility or implausibility of either model.

    Overall, this model is impressive in its ability to generate replay events with realistic and varied statistics, using only a few simple rules. It will be a welcomed addition to the fields of replay, learning and memory, and reinforcement learning.