Grid cells encode reward distance during path integration in cue-rich environments
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (PREreview)
Abstract
The medial entorhinal cortex supports both path integration and landmark anchoring, but how these computations interact during goal-directed navigation is unclear. We show that grid cells dissociate from landmarks and instead encode reward distance when mice perform a path integration task on a cue-rich treadmill. Grid cell population activity reset at rewards and shifted coherently across trials, consistent with continuous attractor dynamics realigned by rewards. Furthermore, grid cells exhibited reduced spatial scales, broadened theta frequency distributions, and altered temporal coordination. These phenomena were captured by a theta interference model incorporating cell competition and two sets of theta oscillating inputs whose frequencies shifted apart. Switching to cue-based navigation stabilized the firing fields and partially restored grid scale, theta frequencies and temporal structure. These results demonstrate that MEC circuits flexibly reset to encode goal-directed trajectories, and suggest that continuous attractor and interference mechanisms normally cooperate but can decouple under path integration demands.
Article activity feed
-
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/17993230.
Summary
This review was prepared by Deryn O. LeDuke under the supervision of Kay M. Tye as part of a partnership with the HHMI TAP peer review workshop to improve transparency and accountability in peer review.
Machen et al. make the primary claim that dopamine 2 receptor (D2R+) neurons in the paraventricular nucleus of the thalamus encode current, and fascinatingly, expected internal state in mice. Machen et al. followed up on previous work (Beas et al., 2024; PMID: 38458192) demonstrating that D2R+ neurons are tuned to changes in physiological state. This paper specifically focused on in vivo fiber photometry recordings from D2R+ PVT neurons in mice performing a linear maze task. …
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/17993230.
Summary
This review was prepared by Deryn O. LeDuke under the supervision of Kay M. Tye as part of a partnership with the HHMI TAP peer review workshop to improve transparency and accountability in peer review.
Machen et al. make the primary claim that dopamine 2 receptor (D2R+) neurons in the paraventricular nucleus of the thalamus encode current, and fascinatingly, expected internal state in mice. Machen et al. followed up on previous work (Beas et al., 2024; PMID: 38458192) demonstrating that D2R+ neurons are tuned to changes in physiological state. This paper specifically focused on in vivo fiber photometry recordings from D2R+ PVT neurons in mice performing a linear maze task. Interoception is a critical ability in animals, encompassing the sensory cues needed to maintain homeostasis (e.g., temperature, thirst, hunger, heart rate, etc.). The potential impact of this research is emblematic in the many psychiatric disorders in which interoception is impaired– including panic disorder, substance abuse, and psychosis. A study demonstrating interoceptive prediction would hold great significance to improve quality of life for these individuals and thus, I find the premise of this paper to be significantly novel. While the arcuate nucleus (ARC) and lateral hypothalamus (LH) are typically involved in "sensing" and "tracking" internal state, Machen et al. make the claim in this paper that PVT(D2R+) neurons may serve as an "integrator" of these signals (current state, environmental cues, and learned experience), anticipating changes in internal state and thus guiding motivated behaviors.
There are several positive elements to appreciate in this paper (i.e., investigating many axes of behavior in one paradigm space, using within-animal comparison to investigate state changes); however, I have major concerns that I believe significantly impact the strength of the claims authors make in the paper. My feedback primarily concerns the experimental design and inclusion of data in the main figures.
Major Points
Motor Confounds in Data: The authors make the primary claim that PVT(D2R+) neurons correspond to motivation based on changes in internal state; however, the PVT(D2R+) signal is correlated to approach latency in every group: every figure demonstrates the group with the lowest approach latency has the largest signal (examples: Figs 1E/2C/3M), suggesting that the velocity of the mice and their motivational state are confounded in these results. Demonstrating that the velocity of the mice as they move through the linear maze is not correlated to increases in GCaMP signal would greatly improve evidence for their primary claim and resolve this potential confound. The authors could plot the velocity of animals (as opposed to latency in Fig 1K/P) against approach signal or by measuring the signal change when mice are not engaged in the linear maze task and/or are consuming state-relevant rewards (SMS for hunger, H2O for thirst) to parse signal changes between motor output and motivation.
Training Data: Training data for cohorts are not included in the results, which would be extremely helpful to determine if there are any changes between groups or mice. Training data are included in figures 7-8; however, these seem to be separate groups than those included in Figures 1-6. Explicitly detailing or summarizing training performance in each group would strengthen claims concerning whether animals learned the task. The training performance is a question because it appears the number of premature trials significantly increases over the course of the training period (Fig 8E). The reason why mice may be less proficient at avoiding premature trials is not discussed, however it suggests an order effect for successive training periods that could confound the main conclusions of the paper. Plotting session features that are sensitive to order, like premature trials, from the fiber photometry experiments would strengthen their initial results demonstrating changes following internal state.
Potential Order Effects: The authors specify sessions in which they altered the internal state of the animal and the reward (i.e., Hungry-strawberry ensure (SMS); Sated-SMS Hungry-sucralose (Ncal); however, these states/rewards do not appear to be counterbalanced, nor do there appear to be complete groups in some figures (examples: Fig 1C (missing Sated-NCal); Fig 3A (missing Sated-SMS)). The conclusions the authors make concerning internal state are thus confounded with the order effect of the recordings– if the Hungry-SMS condition does not come first in this series, would the neural activity look the same? Further confounding these results are the absence of some necessary controls, namely the sated counterparts for some of the hungry/thirsty conditions (for example, between Hungry-SMS, Hungry-Ncal, and Sated-SMS, there is no comparison to a Sated-Ncal group). If mice were recorded in these conditions, as they appear throughout the rest of the paper, it would be helpful to clarify if these conditions were done in the same mice, which would satisfy this concern. The strength of the authors' conclusions would be improved if they included a counterbalanced cohort with a complete set of conditions. This point also follows for within-session changes to the reward, specifically the reward size variation task (Fig 6).
Trial Matching: Depending on the state of the animal, more or less trials are completed in a given session (ex: Fig 1Q). It is unclear if the authors included all trials from all groups or matched the number of trials using random subsampling, which would be a more statistically rigorous approach. If they included all trials, differences in the total population could adjust the variation and thus significantly skew results. The primary claims would be greatly strengthened if authors specified their subsampling method, or otherwise trial-match the data to sufficiently compare across groups.
Predicted Internal State: The authors' primary claim not only concerns internal state but predicted internal state. One of the more convincing effects to this claim is in Fig 2Q, in which mice respond to SMS when hungry and H2O when thirsty, but not H2O when hungry, suggesting an effect specific to internal state. The impact of these results is dampened by the lack of reward variation within-session. In the current structure of the task, the given reward does not change, which makes conclusions concerning internal state prediction specifically difficult. This claim could be strengthened if the authors conducted an experiment in which the reward identity varied (for example: a modification in which the mice could differentiate between trials in which they expect a H2O vs. SMS reward). Alternatively, the authors could conduct an experiment in which animals received an unexpected reward (or no reward) when they were expecting one which, if PVT(D2R+) neurons are indeed tracking internal state predictions, would reasonably generate a reward prediction error. These experiments would justify the internal state prediction claim; however, the authors could additionally modify their claims to avoid conclusions on internal state prediction.
Minor Points
Food and water restriction methods for mice are unclear (l.95-8). The authors comment that food was removed and mice were restricted during training, but there is no language for how long animals were restricted or deprived of food or water before recordings. The clarity of the paper would be improved by explicitly mentioning how long animals were restricted.
Figure 1(A-B) do not correspond to the linear maze task and instead corresponds to viral approach and histology despite being referenced in the text (l. 283-5). It would be helpful to either include the task paradigm pictured in the first supplementary figure in either the main figure (where there is a diagram of the linear maze task), or otherwise reference Fig. S1 instead.
The methods describe providing mice with 8 hours of SMS reward for two consecutive days to prevent neophobia (l. 129-30), but this is not done for any of the other rewards (namely, sucralose and sucrose rewards). It would be beneficial to mention if this was done for all rewards in the methods or discuss why they chose not to in the main text.
Competing interests
The authors declare that they have no competing interests.
Use of Artificial Intelligence (AI)
The authors declare that they did not use generative AI to come up with new ideas for their review.
-
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/17945709.
Summary
The medial entorhinal cortex (MEC) integrates both self-motion and landmark cues, an open question is whether its cells can flexibly adjust their computations depending on behavioral demands. In this paper, the authors investigate whether the MEC cells can dissociate from landmarks anchoring to encode reward distance. To do this, they use electrophysiology to record MEC cells during a virtual reality path-integration(PI) task in mice navigating a cue-rich belt. By comparing the firing properties of MEC cells in a 2D open field, a cue-rich PI task, and a cue-rich fixed reward task, the authors show that both grid cells and border cells remap to encode reward distance. Using …
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/17945709.
Summary
The medial entorhinal cortex (MEC) integrates both self-motion and landmark cues, an open question is whether its cells can flexibly adjust their computations depending on behavioral demands. In this paper, the authors investigate whether the MEC cells can dissociate from landmarks anchoring to encode reward distance. To do this, they use electrophysiology to record MEC cells during a virtual reality path-integration(PI) task in mice navigating a cue-rich belt. By comparing the firing properties of MEC cells in a 2D open field, a cue-rich PI task, and a cue-rich fixed reward task, the authors show that both grid cells and border cells remap to encode reward distance. Using computational modeling, they further conclude that the observed changes in grid cells' properties can be explained by a combination of a continuous attractor model and a theta interference model in which grid cells received two diverged theta frequency inputs. These findings are valuable, as they support the emerging view that MEC cells are not statically anchored to environmental cues but can flexibly shift their reference frame depending on behavioral demands. Overall, the main experiments and modeling are compelling, but additional analyses are needed to support the main claim that MEC grid cells are encoding reward distance during the PI task in a cue-rich environment.
Major comments:
Quantification of reward-distance coding. The authors convincingly show that MEC cells remap across tasks, as illustrated by differences between belt-aligned and reward-aligned firing maps in Figure 2 and Supplementary Figure 1. However, it remains unclear to what extent reward-distance coding accounts for MEC activity in the PI task. Specifically, the proportion of MEC neurons that encode reward distance is not quantified. Several example cells (e.g., cells 18 and 23 in Figure 2a, and cells 12, 7, and 24 in Supplementary Figure 1) appear to encode running distance rather than reward distance. I recommend that the authors use a decoding or model-comparison approach (e.g., GLM-based encoding models) to quantify the contribution of reward distance across the MEC population. If a neuron encodes reward distance, removing reward-distance predictors from the model should significantly reduce model performance. Such quantification would help clarify the definition and prevalence of reward-distance coding in the PI task. Additionally, for the future experimental design, the authors could consider interleaved probe trials(no reward delivered) to investigate whether reward is necessary for the observed neural coding in the PI task.
Interpretation of grid-scale reduction and additional population analysis. The authors conclude that grid scale decreases during the PI task based on reduced inter-field distances. However, inter-field distances measured along a 1D trajectory depend on the angle at which the trajectory intersects the 2D hexagonal lattice (Yoon et al., 2016, DOI: 10.1016/j.cell.2018.08.066 ). Because the authors also report trial-to-trial shifts in firing fields during the PI task, it is unclear whether the apparent grid-scale reduction reflects a change in mapping angle rather than a true change in grid spacing. As the authors already collected consecutive sessions(OF/PI/Cue), I would recommend considering the toroidal analysis(see example in Wen et al., 2024, DOI: 10.1038/s41586-024-08034-3) to compare how the mapping between movement trajectory and population activity of grid cells changes in different tasks. If the 'reward resetting' is true, we should see a fixed bump location in the 2D toroid neural space when the animal passes the reward location, and the revolution of the bump activity traveling in the neural toroid should correspond to the reward trials instead of the belt cycle. The toroidal analysis will allow quantification of the anchoring strength of reward relative to landmarks in different conditions.
Clarification of reward delivery. The authors should consider elaborating on what aspect of reward delivery entrains the grid-cell pattern. It is possible that sensory cues associated with reward delivery serve as the most reliable landmark, with the mouse using running distance to estimate the reward location during the PI task. In the Methods section, it would be helpful to clarify how the reward is delivered—automatically upon entering a reward zone, or contingent on behavioral criteria such as slowing or anticipatory licking. Given the behavioral analyses referenced in Figures 3a and 3g, quantifying behavioral engagement (misses, earlier licking, etc) would also help clarify whether reward-distance coding depends on active path integration and whether animals switch between reference frames.
Minor comments:
In "Results—Identification of grid and border cells," the term "trials" is used to refer to reward-to-reward journeys in the PI task. This is potentially confusing because "trials" is also used for belt cycles in the fixed-reward cue-rich task and again in Figure 2c (belt-aligned vs. reward-aligned trials). Using distinct terms—e.g., "belt cycles" and "reward-to-reward journeys"—would improve clarity.
In Figure 3a, it would be helpful to annotate how the correlation matrix was computed and how comparisons were made across animals and sessions. It is also unclear whether the example matrix represents a single session or aggregated data.
Competing interests
The authors declare that they have no competing interests.
Use of Artificial Intelligence (AI)
The authors declare that they did not use generative AI to come up with new ideas for their review.
-