Reactivation strength during cued recall is modulated by graph distance within cognitive maps

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This magnetoencephalography study reports important new findings regarding the nature of memory reactivation during cued recall. It replicates previous work showing that such reactivation can be sequential or clustered, with sequential reactivation being more prevalent in low performers. It adds convincing evidence, even though based on limited amounts of data, that high memory performers tend to show simultaneous (i.e., clustered) reactivation, varying in strength with item distance in the learned graph structure. The study will be of interest to scientists studying memory replay.

This article has been Reviewed by the following groups

Read the full article

Abstract

Declarative memory retrieval is thought to involve reinstatement of neuronal activity patterns elicited and encoded during a prior learning episode. Furthermore, it is suggested that two mechanisms operate during reinstatement, dependent on task demands: individual memory items can be reactivated simultaneously as a clustered occurrence or, alternatively, replayed sequentially as temporally separate instances. In the current study, participants learned associations between images that were embedded in a directed graph network and retained this information over a brief 8 min consolidation period. During a subsequent cued recall session, participants retrieved the learned information while undergoing magnetoencephalographic recording. Using a trained stimulus decoder, we found evidence for clustered reactivation of learned material. Reactivation strength of individual items during clustered reactivation decreased as a function of increasing graph distance, an ordering present solely for successful retrieval but not for retrieval failure. In line with previous research, we found evidence that sequential replay was dependent on retrieval performance and was most evident in low performers. The results provide evidence for distinct performance-dependent retrieval mechanisms, with graded clustered reactivation emerging as a plausible mechanism to search within abstract cognitive maps.

Article activity feed

  1. Author response:

    The following is the authors’ response to the previous reviews.

    Reviewer #1 (Recommendations For The Authors):

    Results showing reactivation for near and far items separately are now included in Fig. 5 and convincingly suggest a simultaneous reactivation. For me, the open question remaining (see public) review is the degree to which the methods used here to show clustered vs sequential reactivation are mutually exclusive; and if the pre-selection of a time window of peak reactivation (based on all future items) biases the analyses towards clustered reactivation. The discussion would benefit from a brief discussion of these issues.

    We have added a brief discussion of the issues. However, we want to clarify a minor point of the public review: While our interpretation implies that replay and reactivation are probably mutually exclusive within a single retrieval event, it does not imply that strategies cannot vary within different retrieval events of the same participant. Nevertheless, we want to address this raised concern (that is, if we understand correctly, that replay events that are contained within the time window of the reactivation analysis could not be distinguished by the chosen methods) and have added it to the discussion.

    The corresponding sentence reads:

    “[…] Finally, we want to acknowledge that by selecting a time window for the clustered reactivation we cannot distinguish very fast replay events (<=30ms) from clustered reactivation if they are contained exactly within the specific reactivation analysis time window..

    Reviewer #2 (Recommendations For The Authors):

    Figure 5D shows the difference scores between near vs. distant items for learning and retrieval. Similar to Figure 5 from the first version of your paper, the difference score does not show whether reactivation of the near vs. distant items change from learning to retrieval. You could show this change in a 2 (near vs. distant) x 2 (learning vs. retrieval) box plot (corresponding to Figure 5A).

    We have added the requested plot as supplement 9 and referred to it in the figure description. However comparing absolute, raw probabilities between different blocks is tricky, as baseline probabilities are varying over time (e.g. due to shift in distance to sensors), therefore, differential reactivation might be better suited as it is a relative measure to compare between blocks.

    At the end of the results section, you state: "On average, differential reactivation probability increased from pre to post resting state (Figure 5D).". I would suggest providing some statistical comparison and the corresponding values.

    We have calculated and added respective p-value statistics of a T-Test and reported that the increase is only descriptive and not statistically significant.

  2. eLife assessment

    This magnetoencephalography study reports important new findings regarding the nature of memory reactivation during cued recall. It replicates previous work showing that such reactivation can be sequential or clustered, with sequential reactivation being more prevalent in low performers. It adds convincing evidence, even though based on limited amounts of data, that high memory performers tend to show simultaneous (i.e., clustered) reactivation, varying in strength with item distance in the learned graph structure. The study will be of interest to scientists studying memory replay.

  3. Reviewer #1 (Public Review):

    Summary:

    Previous work in humans and non-human animals suggests that during offline periods following learning, the brain replays newly acquired information in a sequential manner. The present study uses a MEG-based decoding approach to investigate the nature of replay/reactivation during a cued recall task directly following a learning session, where human participants are trained on a new sequence of 10 visual images embedded in a graph structure. During retrieval, participants are then cued with two items from the learned sequence, and neural evidence is obtained for the simultaneous or sequential reactivation of future sequence items. The authors find evidence for both sequential and clustered (i.e., simultaneous) reactivation. Replicating previous work, low-performing participants tend to show sequential, temporally segregated reactivation of future items, whereas high-performing participants show more clustered reactivation. Adding to previous work, the authors show that an image's reactivation strength varies depending on its proximity to the retrieval cue within the graph structure.

    Strengths:

    As the authors point out, work on memory reactivation has largely been limited to the retrieval of single associations. Given the sequential nature of our real-life experiences, there is clearly value in extending this work to structured, sequential information. State-of-the-art decoding approaches for MEG are used to characterize the strength and timing of item reactivation. The manuscript is very well written with helpful and informative figures in the main sections. The task includes an extensive localizer with 50 repetitions per image, allowing for stable training of the decoders and the inclusion of several sanity checks demonstrating that on-screen items can be decoded with high accuracy.

    Weaknesses:

    Of major concern, the experiment is not optimally designed for analysis of the retrieval task phase, where only 4 min of recording time and a single presentation of each cue item are available for the analyses of sequential and non-sequential reactivation. In their revision, the authors include data from the learning blocks in their analysis. These blocks follow the same trial structure as the retrieval task, and apart from adding more data points could also reveal a possible shift from sequential to clustered reactivation as learning of the graph structure progresses. The new analyses are not entirely conclusive, maybe given the variability in the number of learning blocks that participants require to reach the criterion. In principle, they suggest that reactivation strength increases from learning (pre-rest) to final retrieval (post-rest).

    On a more conceptual note, the main narrative of the manuscript implies that sequential and clustered reactivation are mutually exclusive, such that a single participant would show either one or the other type. With the analytic methods used here, however, it seems possible to observe both types of reactivation. For example, the observation that mean reactivation strength (across the entire trial, or in a given time window of interest) varies with graph distance does not exclude the possibility that this reactivation is also sequential. In fact, the approach of defining one peak time window of reactivation may bias towards simultaneous, graded reactivation. It would be helpful if the authors could clarify this conceptual point. A strong claim that the two types of reactivation are mutually exclusive would need to be substantiated by further evidence, for instance, a suitable metric contrasting "sequenceness" vs "clusteredness".

    On the same point, the non-sequential reactivation analyses use a time window of peak decodability that is determined based on the average reactivation of all future items, irrespective of graph distance. In a sequential forward cascade of reactivations, it could be assumed that the reactivation of near items would peak earlier than the reactivation of far items. In the revised manuscript, the authors now show the "raw" timecourses of item decodability at different graph distances, clearly demonstrating their peak reactivation times, which show convincingly that reactivation for near and far items occurs at very similar time points. The question that remains, therefore, is whether the method of pre-selecting a time window of interest described above could exert a bias towards finding clustered reactivation.

  4. Reviewer #2 (Public Review):

    Summary:

    The authors investigate replay (defined as sequential reactivation) and clustered reactivation during retrieval of an abstract cognitive map. Replay and clustered reactivation were analysed based on MEG recordings combined with a decoding approach. While the authors state to find evidence for both, replay and clustered reactivation during retrieval, replay was exclusively present in low performers. Further, the authors show that reactivation strength declined with an increasing graph distance.

    Strengths:

    The paper raises interesting research questions, i.e., replay vs. clustered reactivation and how that supports retrieval of cognitive maps. The paper is well written, well structured and easy to follow. The methodological approach is convincing and definitely suited to address the proposed research questions.

    The paper is a great combination between replicating previous findings (Wimmer et al. 2020) with a new experimental approach but at the same time presenting novel evidence (reactivation strength declines as a function of graph distance).

    What I also want to positively highlight is their general transparency. For example, they pre-registered this study but with a focus on a different part of the data and outlined this explicitly in the paper.

    The paper has very interesting findings. However, there are some shortcomings, especially in the experimental design. These are shortly outlined below but are also openly and in detail discussed by the authors.

    Weaknesses:

    The individual findings are interesting. However, due to some shortcomings in the experimental design they cannot be profoundly related to each other. For example, the authors show that replay is present in low but not in high performers with the assumption that high performers tend to simultaneously reactivate items. But then, the authors do not investigate clustered reactivation (= simultaneous reactivation) as a function of performance due to a low number of retrieval trials and ceiling performance in most participants.
    As a consequence of the experimental design, some analyses are underpowered (very low number of trials, n = ~10, and for some analyses, very low number of participants, n = 14).

  5. Author response:

    The following is the authors’ response to the original reviews.

    Reviewer 1

    (1) Given the low trial numbers, and the point of sequential vs clustered reactivation mentioned in the public review, it would be reassuring to see an additional sanity check demonstrating that future items that are currently not on-screen can be decoded with confidence, and if so, when in time the peak reactivation occurs. For example, the authors could show separately the decoding accuracy for near and far items in Fig. 5A, instead of plotting only the difference between them.

    We have now added the requested analysis showing the raw decoded probabilities for near and distant items separately in Figure 5A. We have also chosen to replace Figure 5B with the new figure as we think it provides more information than the previous Figure 5B. Instead, we have moved Figure 5B to the supplement. The median peak decoded accuracy for near and distant items is equivalent. We have added the following description to the figure:

    “Decoded raw probabilities for off-screen items, that were up to two steps ahead of the current stimulus cue (‘near’,) vs. distant items that were more than two steps away on the graph, on trials with correct answers. The median peak decoded probability for near and distant items was at the same time point for both probability categories. Note that displayed lines reflect the average probability while, to eliminate influence of outliers, the peak displays the median.”

    (2) The non-sequential reactivation analyses often use a time window of peak decodability, and it was not entirely clear to me what data this time window is determined on, e.g., was it determined based on all future reactivations irrespective of graph distance? This should be clarified in the methods.

    Thank you for raising this. We now clarify this in the relevant section to read: “First, we calculated a time point of interest by computing the peak probability estimate of decoders across all trials, i.e., the average probability for each timepoint of all trials (except previous onscreen items) of all distances, which is equivalent to the peak of the differential reactivation analysis”

    (3) Fig 4 shows evidence for forward and backward sequential reactivation, suggesting that both forward and backward replay peak at a lag of 40-50msec. It would be helpful if this counterintuitive finding could be picked up in the discussion, explaining how plausible it is, physiologically, to find forward and backward replay at the same lag, and whether this could be an artifact of the TDLM method.

    This is an important point and we agree that it appears counterintuitive. However, we would highlight this exact time range has been reported in previous studies, though t never for both forward and backward replay. We now include a discussion of this finding. The section now reads:

    “[… ] Even though we primarily focused on the mean sequenceness scores across time lags, there appears s to be a (non-significant) peak at 40-60 milliseconds. While simultaneous forward and backward replay is theoretically possible, we acknowledge that it is somewhat surprising and, given our paradigm, could relate to other factors such as autocorrelations (Liu, Dolan, et al., 2021).”

    (4) It is reported that participants with below 30% decoding accuracy are excluded from the main analyses. It would be helpful if the manuscript included very specific information about this exclusion, e.g., was the criterion established based on the localizer cross-validated data, the temporal generalisation to the cued item (Fig. 2), or only based on peak decodability of the future sequence items? If the latter, is it applied based on near or far reactivations, or both?

    We now clarify this point to include more specific information, which reads:

    “[…] Therefore, we decided a priori that participants with a peak decoding accuracy of below 30% would be excluded from the analysis (nine participants in all) as obtained from the cross-validation of localizer trials”

    (5) Regarding the low amount of data for the reactivation analysis, the manuscript should be explicit about the number of trials available for each participant. For example, Supplemental Fig. 1 could provide this information directly, rather than the proportion of excluded trials.

    We have adapted the plot in the supplement to show the absolute number of rejected epochs per participant, in addition to the ratio.

    (6) More generally, the supplements could include more detailed information in the legends.

    We agree and have added more extensive explanation of the plots in the supplement legends.

    (7) The choice of comparing the 2 nearest with all other future items in the clustered reactivation analysis should be better motivated, e.g., was this based on the Wimmer et al. (2020) study?

    We have added our motivation for taking the two nearest items and contrasting them with the items further away. The paragraph reads:

    “[…] We chose to combine the following two items for two reasons: First, this doubled the number of included trials; secondly, using this approach the number of trials for each category (“near” and “distant”) was more balanced. […]”

    Reviewer 2

    (1) Focus exclusively on retrieval data (and here just on the current image trials).

    If I understand correctly, you focus all your analyses (behavioural as well as MEG analyses) on retrieval data only and here just on the current image trials. I am surprised by that since I see some shortcomings due to that. These shortcomings can likely be addressed by including the learning data (and predecessor image trials) in your analyses.

    a) Number of trials: During each block, you presented each of the twelve edges once. During retrieval, participants then did one "single testing session block". Does that mean that all your results are based on max. 12 trials? Given that participants remembered, on average, 80% this means even fewer trials, i.e., 9-10 trials?

    This is correct and a limitation of the paper. However, while we used only correct trials for the reactivation analysis, the sequential analysis was conducted using all trials disregarding the response behaviour. To retain comparability with previous studies we mainly focused on data from after a consolidation phase. Nevertheless, despite the trial limitation we consider the results are robust and worth reporting. Additionally, based on the suggestion of the referee, we now include results from learning blocks (see below).

    b) Extend the behavioural and replay/reactivation analysis to predecessor images.

    Why do you restrict your analyses to the current image trials? Especially given that you have such a low trial number for your analyses, I was wondering why you did not include the predecessor trials (except the non-deterministic trials, like the zebra and the foot according to Figure 2B) as well.

    We agree it would be great to increase power by adding the predecessor images to the current image cue analysis, excluding the ambiguous trials, we did not do so as we considered the underlying retrieval processes of these trial types are not the same, i.e. cannot be simply combined. Nevertheless, we have performed the suggested analysis to check if it increases our power. We found, that the reactivation effect is robust and significant at the same time point of 220-230 ms. However, the effect size actually decreased: While before, peak differential reactivation was at 0.13, it is now at 0.07. This in fact makes conceptual sense. We suspect that the two processes that are elicited by showing a single cue and by showing a second, related, cue are distinct insofar as the predecessor image acts as a primer for the current image, potentially changing the time course/speed of retrieval. Given our concerns that the two processes are not actually the same we consider it important to avoid mixing these data.

    We have added a statement to the manuscript discussing this point. The section reads:

    “Note that we only included data from the current image cue, and not from the predecessor image cue, as we assume the retrieval processes differ and should not be concatenated.”

    c) Extend the behavioural and replay/reactivation analysis to learning trials.

    Similar to point 1b, why did you not include learning trials in your analyses?

    The advantage of including (correct and incorrect) learning trials has the advantage that you do not have to exclude 7 participants due to ceiling performance (100%).

    Further, you could actually test the hypothesis that you outline in your discussion: "This implies that there may be a switch from sequential replay to clustered reactivation corresponding to when learned material can be accessed simultaneously without interference." Accordingly, you would expect to see more replay (and less "clustered" reactivation) in the first learning blocks compared to retrieval (after the rest period).

    To track reactivation and replay over the course of learning is a great idea. We have given a lot of thought as to how to integrate these findings but have not found a satisfying solution. Thus, analysis of the learning data turned out to be quite tricky: We decided that each participant should perform as many blocks as necessary to reach at least 80% (with a limit of six and lower bound of two, see Supplement figure 4). Indeed, some participant learned 100% of the sequence after one block (these were mostly medical students, learning things by hard is their daily task). With the benefit of hindsight, we realise our design means that different blocks are not directly comparable between participants. In theory, we would expect that replay emerges in parallel with learning and then gradually changes to clustered reactivation as memory traces become consolidated/stronger. However, it is unclear when replay should emerge and when precisely a switch to clustered reactivation would happen. For this reason, we initially decided not to include the learning trials into the paper.

    Nevertheless, to provide some insight into the learning process, and to see how consolidation impacts differential reactivation and replay, we have split our data into pre and post resting state, aggregating all learning trials of each participant. While this does not allow us to track processes on a block basis, it does offer potential (albeit limited) insight into the hypothesis we outline in the discussion.

    For reactivation, we see emergence of a clear increase, further strengthening the outlined hypothesis, however, for replay the evidence is less clear, as we do not know over how many learning blocks replay is expected.

    We calculated individual trajectories of how reactivation and replay changes from learning to retrieval and related these to performance. Indeed, we see an increase of reactivation is nominally associated with higher learning performance, while an increase in replay strength is associated with lower performance (both non-significant). However, due to the above-mentioned reasons we think it would premature to add this weak evidence to the paper.

    To mitigate problems of experiment design in relation to this question we are currently implementing a follow-study, where we aim to normalize the learning process across participants and index how replay/reactivation changes over the course of learning and after consolidation.

    We have added plots showing clustered reactivation sequential replay measures during learning (Figure 5D and Supplement 8)

    The added section(s) now read:

    “To provide greater detail on how the 8-minute consolidation period affected reactivation we, post-hoc, looked at relevant measures across learning trials in contrast to retrieval trials. For all learning trials, for each participant, we calculated differential reactivation for the same time point we found significant in the previous analysis (220-260 milliseconds). On average, differential reactivation probability increased from pre to post resting state (Figure 5D). […]

    Nevertheless, even though our results show a nominal increase in reactivation from learning to retrieval (see Figure 5D), due to experimental design features our data do not enable us to test for an hypothesized switch for sequential replay (see also “limitations” and Supplement 8).”

    d) Introduction (last paragraph): "We examined the relationship of graph learning to reactivation and replay in a task where participants learned a ..." If all your behavioural analyses are based on retrieval performance, I think that you do not investigate graph learning (since you exclusively focus the analyses on retrieving the graph structure). However, relating the graph learning performance and replay/reactivation activity during learning trials (i.e., during graph learning) to retrieval trials might be interesting but beyond the scope of this paper.

    We agree. We have changed the wording to be more accurate. Indeed, we do not examine graph learning but instead examine retrieval from a graph, after graph learning. The mentioned sentence now read

    “[…] relationship of retrieval from a learned graph structure to reactivation [...]”

    e) It is sometimes difficult to follow what phase of the experiment you refer to since you use the terms retrieval and test synonymously. Not a huge problem at all but maybe you want to stick to one term throughout the whole paper.

    Thank you for pointing this out. We have now adapted the manuscript to exclusively refer to “retrieval” and not to “test”.

    (2) Is your reactivation clustered?

    In Figure 5A, you compare the reactivation strength of the two items following the cue image (i.e., current image trials) with items further away on the graph. I do not completely understand why your results are evidence for clustered reactivation in contrast to replay.

    First, it would be interesting to see the reactivation of near vs. distant items before taking the difference (time course of item probabilities).

    (copied answer from response to Reviewer 1, as the same remark was raised)

    We have added the requested analysis showing the raw decoded probabilities for near and distant items separately in Figure 5A. We have chosen to replace Figure 5B with the new figure as we think that it offers more information than the previous Figure 5B. Instead, we have moved Figure 5B to the supplement. The median peak decoded accuracy for near and distant items is equivalent. We have added the following description to the figure:

    “Decoded raw probabilities for off-screen items, that were up to two steps ahead of the current stimulus cue (‘near’,) vs. distant items that were more than two steps away on the graph, on trials with correct answers. The median peak decoded probability for near and distant items was at the same time point for both probability categories. Note that displayed lines reflect the average probability while, to eliminate influence of outliers, the peak displays the median. .”

    Second, could it still be that the first item is reactivated before the second item? By averaging across both items, it becomes not apparent what the temporal courses of probabilities of both items look like (and whether they follow a sequential pattern). Additionally, the Gaussian smoothing kernel across the time dimension might diminish sequential reactivation and favour clustered reactivation. (In the manuscript, what does a Gaussian smoothing kernel of  = 1 refer to?). Could you please explain in more detail why you assume non-sequential clustered reactivation here and substantiate this with additional analyses?

    We apologise for the unclear description. Note the Gaussian kernel is in fact only used for the reactivation analysis and not the replay analysis, so any small temporal successions would have been picked up by the sequential analysis. We now clarify this in the respective section of the sequential analysis and also explain the parameter of delta= 1 in the reactivation analysis section. The paragraph now reads

    “[…] As input for the sequential analysis, we used the raw probabilities of the ten classifiers corresponding to the stimuli. [...]

    […] Therefore, to address this we applied a Gaussian smoothing kernel (using scipy.ndimage.gaussian_filter with the default parameter of σ=1 which corresponds approximately to taking the surrounding timesteps in both direction with the following weighting: current time step: 40%, ±1 step: 25%, ±2 step: 5%, ±3 step: 0.5%) [...]”

    (3) Replay and/or clustered reactivation?

    The relationship between the sequential forward replay, differential reactivation, and graph reactivation analysis is not really apparent. Wimmer et al. demonstrated that high performers show clustered reactivation rather than sequential reactivation. However, you did not differentiate in your differential reactivation analysis between high vs. low performers. (You point out in the discussion that this is due to a low number of low performers.)

    We agree that a split into high vs low performers would have been preferably for our analysis. However, there is one major obstacle that made us opt for a correlational analysis instead: We employed criteria learning, rendering a categorical grouping conceptually biased. Even though not all participants reached the criteria of 80%, our sample did not naturally split between high and low performers but was biased towards higher performance, leaving the groups uneven. The median performance was 83% (mean ~81%), with six of our subjects (~1/4th of included participant) having this exact performance. This makes a median or mean split difficult, as either binning assignment choice would strongly affect the results. We have added a limitations section in which we extensively discuss this shortcoming and reasoning for not performing a median split as in Wimmer et al (2020). The section now reads:

    “There are some limitations to our study, most of which originate from a suboptimal study design. [...], as we performed criteria learning, a sub-group analysis as in Wimmer et al., (2020) was not feasible, as median performance in our sample would have been 83% (mean 81%), with six participants exactly at that threshold. [...]”

    It might be worth trying to bring the analysis together, for example by comparing sequential forward replay and differential reactivation at the beginning of graph learning (when performance is low) vs. retrieval (when performance is high).

    Thank you for the suggestion to include the learning segments, which we think improves the paper quite substantially. However, analysis of the learning data turned out to be quite tricky> We had decided that each participant should perform as many blocks as necessary to reach at least 80% accuracy (with a limit of six and lower bound of two, see Supplement figure 4). Some participants learned 100% of the sequence after one block (these were mostly medical students, learning things by hard is their daily task). This in hindsight is an unfortunate design feature in relation to learning as it means different blocks are not directly comparable between participants.

    In theory, we would expect that replay emerges in parallel with learning and then gradually change to clustered reactivation, as memory traces get consolidated/stronger. However, it is unclear when replay would emerge and when the switch to reactivation would happen. For this reason, we initially decided not to include the learning trials into the paper at all.

    Nevertheless, to give some insight into the learning process and to see how consolidation effects differential reactivation and replay, we have split our data into pre and post resting state, aggregating all learning trials of each participant. While this does not allow us to track measures of interest on a block basis, it gives some (albeit limited) insight into the hypothesis outlined in our discussion.

    For reactivation, we see a clear increase, further strengthening the outlined hypothesis, However, for replay the evidence is less obvious, potentially due to that fact that we do not know across how many learning blocks replay is to be expected.

    The added section(s) now read:

    “To examine how the 8-minute consolidation period affected reactivation we, post-hoc, looked at relevant measures during learning trials in contrast to retrieval trials. For all learning trial, for each participant, we calculated differential reactivation for the time point we found significant during the previous analysis (220-260 milliseconds). On average, differential reactivation probability increased from pre to post resting state (Figure 5D).

    […]

    Nevertheless, even though our results show a nominal increase in reactivation from learning to retrieval (see Figure 5D), our data does not enable us to show an hypothesized switch for sequential replay (see also “limitations” and Supplement 8).”

    Additionally, the main research question is not that clear to me. Based on the introduction, I thought the focus was on replay vs. clustered reactivation and high vs. low performance (which I think is really interesting). However, the title is more about reactivation strength and graph distance within cognitive maps. Are these two research questions related? And if so, how?

    We agree we need to be clearer on this point. We have added two sentences to the introduction, which should address this point. The section now reads:

    “[…] In particular, the question remains how the brain keeps track of graph distances for successful recall and whether the previously found difference between high and low performers also holds true within a more complex graph learning context.”

    (4) Learning the graph structure.

    I was wondering whether you have any behavioural measures to show that participants actually learn the graph structure (instead of just pairs or triplets of objects). For example, do you see that participants chose the distractor image that was closer to the target more frequently than the distractor image that was further away (close vs. distal target comparison)? It should be random at the beginning of learning but might become more biased towards the close target.

    Thanks, this is an excellent suggestion. Our analysis indeed shows that people take the near lure more often than the far lure in later blocks, while it is random in the first block.

    Nevertheless, we have decided to put these data into the supplement and reference it in the text. This is because analysis of the learning blocks is challenging and biased in general. Each participant had a different number of learning blocks based on their learning rate, and this makes it difficult to compare learning across participants. We have tried our best to accommodate and explain these difficulties in the figure legend. Nevertheless, we thank the referee for guidance here and this analysis indeed provides further evidence that participants learned the actual graph structure.

    The added section reads

    “Additionally, we have included an analysis showing how wrong answers participants provided were random in the first block and biased towards closer graph nodes in later blocks. This is consistent with participants actually learning the underlying graph structure as opposed to independent triplets (see figure and legend of Supplement 6 for details).”

    (5) Minor comments

    a) "Replay analysis relies on a successive detection of stimuli where the chance of detection exponentially decreases with each step (e.g., detecting two successive stimuli with a chance of 30% leaves a 9% chance of detecting the replay event). " Could you explain in more detail why 30% is a good threshold then?

    Thank you. We have further clarified the section. As we are working mainly with probabilities, it is useful to keep in mind that accuracy is a class metric that only provides a rough estimate of classifier ability. Alternatively, something like a Top-3-Accuracy would be preferable, but also slightly silly in the context of 10 classes.

    Nevertheless, subtle changes in probability estimates are present and can be picked up by the methods we employ. Therefore, the 30% is a rough lower bound and decided based on pilot data that showed that clean MEG data from attentive participants can usually reach this threshold. The section now reads:

    “(e.g., detecting two successive stimuli with a chance of 30% leaves a 9% chance of detecting a replay event). However, one needs to bear in mind that accuracy is a “winnertakes-all” metric indicating whether the top choice also has the highest probability, disregarding subtle, relative changes in assigned probability. As the methods used in this analysis are performed on probability estimates and not class labels, one can expect that the 30% are a rough lower bound and that the actual sensitivity within the analysis will be higher. Additionally, based on pilot data, we found that attentive participants were able to reach 30% decodability, allowing us to use decodability as a data quality check. “

    b) Could you make explicit how your decoders were designed? Especially given that you added null data, did you train individual decoders for one class vs. all other classes (n = 9 + null data) or one class vs. null data?

    We added detail to the decoder training. The section now reads

    “Decoders were trained using a one-vs-all approach, which means that for each class, a separate classifier was trained using positive examples (target class) and negative examples (all other classes) plus null examples (data from before stimulus presentation, see below). In detail, null data was.”

    c) Why did you choose a ratio of 1:2 for your null data?

    Our choice for using a higher ratio was based upon previous publications reporting better sensitivity of TDLM using higher ratios, as spatial sensor correlations are decreasing. Nevertheless, this choice was not well investigated beforehand. We have added more information to this to the manuscript

    d) You could think about putting the questionnaire results into the supplement if they are sanity checks.

    We have added the questionnaire results. However, due to the size of the tables, we have decided to add them as excel files into the supplementary files of the code repository. We have mentioned the existence file in the publication.

    e) Figure 2. There is a typo in D: It says "Precessor Image" instead of "Predecessor Image".

    Fixed typo in figure.

    f) You write "Trials for the localizer task were created from -0.1 to 0.5 seconds relative to visual stimulus onset to train the decoders and for the retrieval task, from 0 to 1.5 seconds after onset of the second visual cue image." But the Figure legend 3D starts at -0.1 seconds for the retrieval test.

    We have now clarified this. For the classifier cross-validation and transfer sanity check and clustered analysis we used trials from -0.1 to 0.5s, whereas for the sequenceness analysis of the retrieval, we used trials from 0 to 1.5 seconds

  6. eLife assessment

    This magnetoencephalography study reports important new findings regarding the nature of memory reactivation during cued recall. It replicates previous work showing that such reactivation can be sequential or clustered, with sequential reactivation being more prevalent in low performers. It adds convincing evidence, even though based on limited amounts of data, that high memory performers tend to show simultaneous (i.e., clustered) reactivation, varying in strength with item distance in the learned graph structure. The study will be of interest to scientists studying memory replay.

  7. Reviewer #1 (Public Review):

    Summary:

    Previous work in humans and non-human animals suggests that during offline periods following learning, the brain replays newly acquired information in a sequential manner. The present study uses an MEG-based decoding approach to investigate the nature of replay/reactivation during a cued recall task directly following a learning session, where human participants are trained on a new sequence of 10 visual images embedded in a graph structure. During retrieval, participants are then cued with two items from the learned sequence, and neural evidence is obtained for the simultaneous or sequential reactivation of future sequence items. The authors find evidence for both sequential and clustered (i.e., simultaneous) reactivation. Replicating previous work, low-performing participants tend to show sequential, temporally segregated reactivation of future items, whereas high-performing participants show more clustered reactivation. Adding to previous work, the authors show that an image's reactivation strength varies depending on its proximity to the retrieval cue within the graph structure.

    Strengths:

    As the authors point out, work on memory reactivation has largely been limited to the retrieval of single associations. Given the sequential nature of our real-life experiences, there is clearly value in extending this work to structured, sequential information. State-of-the-art decoding approaches for MEG are used to characterize the strength and timing of item reactivation. The manuscript is very well written with helpful and informative figures in the main sections. The task includes an extensive localizer with 50 repetitions per image, allowing for stable training of the decoders and the inclusion of several sanity checks demonstrating that on-screen items can be decoded with high accuracy.

    Weaknesses:

    Of major concern, the experiment is not optimally designed for analysis of the retrieval task phase, where only 4 min of recording time and a single presentation of each cue item are available for the analyses of sequential and non-sequential reactivation. In their revision, the authors include data from the learning blocks in their analysis. These blocks follow the same trial structure as the retrieval task, and apart from adding more data points could also reveal a possible shift from sequential to clustered reactivation as learning of the graph structure progresses. The new analyses are not entirely conclusive, maybe given the variability in the number of learning blocks that participants require to reach criterion. In principal, they suggest that reactivation strength increases from learning (pre-rest) to final retrieval (post-rest).

    On a more conceptual note, the main narrative of the manuscript implies that sequential and clustered reactivation are mutually exclusive, such that a single participant would show either one or the other type. With the analytic methods used here, however, it seems possible to observe both types of reactivation. For example, the observation that mean reactivation strength (across the entire trial, or in a given time window of interest) varies with graph distance does not exclude the possibility that this reactivation is also sequential. In fact, the approach of defining one peak time window of reactivation may bias towards simultaneous, graded reactivation. It would be helpful if the authors could clarify this conceptual point. A strong claim that the two types of reactivation are mutually exclusive would need to be substantiated by further evidence, for instance a suitable metric contrasting "sequenceness" vs "clusteredness".

    On the same point, the non-sequential reactivation analyses use a time window of peak decodability that is determined based on the average reactivation of all future items, irrespective of graph distance. In a sequential forward cascade of reactivations, it could be assumed that the reactivation of near items would peak earlier than the reactivation of far items. In the revised manuscript, the authors now show the "raw" timecourses of item decodability at different graph distances, clearly demonstrating their peak reactivation times, which show convincingly that reactivation for near and far items occurs at very similar time points. The question that remains, therefore, is whether the method of pre-selecting a time window of interest described above could exert a bias towards finding clustered reactivation.

  8. Reviewer #2 (Public Review):

    Summary:

    The authors investigate replay (defined as sequential reactivation) and clustered reactivation during retrieval of an abstract cognitive map. Replay and clustered reactivation were analysed based on MEG recordings combined with a decoding approach. While the authors state to find evidence for both, replay and clustered reactivation during retrieval, replay was exclusively present in low performers. Further, the authors show that reactivation strength declined with an increasing graph distance.

    Strengths:

    The paper raises interesting research questions, i.e., replay vs. clustered reactivation and how that supports retrieval of cognitive maps. The paper is well written, well structured and easy to follow. The methodological approach is convincing and definitely suited to address the proposed research questions.

    The paper is a great combination between replicating previous findings (Wimmer et al. 2020) with a new experimental approach but at the same time presenting novel evidence (reactivation strength declines as a function of graph distance).

    What I also want to positively highlight is their general transparency. For example, they pre-registered this study but with a focus on a different part of the data and outlined this explicitly in the paper.

    The paper has very interesting findings. However, there are some shortcomings especially in the experimental design. These are shortly outlined below but are also openly and in detail discussed by the authors.

    Weaknesses:

    The individual findings are interesting. However, due to some shortcomings in the experimental design they cannot be profoundly related to each other. For example, the authors show that replay is present in low but not in high performers with the assumption that high performers tend to simultaneously reactivate items. But then, the authors do not investigate clustered reactivation (= simultaneous reactivation) as a function of performance due to a low number of retrieval trials and ceiling performance in most participants.
    As a consequence of the experimental design, some analyses are underpowered (very low number of trials, n = ~10, and for some analyses, very low number of participants, n = 14).

  9. eLife assessment

    This MEG study reports valuable new findings regarding the nature of memory reactivation during cued recall. It replicates previous work showing that such reactivation can be sequential or clustered, with sequential reactivation being more prevalent in low performers. It adds solid evidence, even though based on limited data, that item strengths during clustered reactivation vary with item distance in the learned graph structure. The study will be of interest to human and rodent neuroscientists working on memory replay.

  10. Reviewer #1 (Public Review):

    Summary:
    Previous work in humans and non-human animals suggests that during offline periods following learning, the brain replays newly acquired information in a sequential manner. The present study uses a MEG-based decoding approach to investigate the nature of replay/reactivation during a cued recall task directly following a learning session, where human participants are trained on a new sequence of 10 visual images embedded in a graph structure. During retrieval, participants are then cued with two items from the learned sequence, and neural evidence is obtained for the simultaneous or sequential reactivation of future sequence items. The authors find evidence for both sequential and clustered (i.e., simultaneous) reactivation. Replicating previous work by Wimmer et al. (2020), low-performing participants tend to show sequential, temporally segregated reactivation of future items, whereas high-performing participants show more clustered reactivation. Adding to previous work, the authors show that an image's reactivation strength varies depending on its proximity to the retrieval cue within the graph structure.

    Strengths:
    As the authors point out, work on memory reactivation has largely been limited to the retrieval of single associations. Given the sequential nature of our real-life experiences, there is clearly value in extending this work to structured, sequential information. State-of-the-art decoding approaches for MEG are used to characterize the strength and timing of item reactivation. The manuscript is very well written with helpful and informative figures in the main sections. The task includes an extensive localizer with 50 repetitions per image, allowing for stable training of the decoders and the inclusion of several sanity checks demonstrating that on-screen items can be decoded with high accuracy.

    Weaknesses:
    Of major concern, the experiment is not optimally designed for analysis of the retrieval task phase, where only 4 min of recording time and a single presentation of each cue item are available for the analyses of sequential and non-sequential reactivation. The authors could consider including data from the (final) learning blocks in their analysis. These blocks follow the same trial structure as the retrieval task, and apart from adding more data points could also reveal important insights regarding a possible shift from sequential to clustered reactivation as learning of the graph structure progresses.

    On a more conceptual note, the main narrative of the manuscript implies that sequential and clustered reactivation are mutually exclusive, such that a single participant would show either one or the other type. With the analytic methods used here, however, it seems possible to observe both types of reactivation. For example, the observation that mean reactivation strength (across the entire trial, or in a given time window of interest) varies with graph distance does not exclude the possibility that this reactivation is also sequential. In fact, the approach of defining one peak time window of reactivation may be biased towards simultaneous, graded reactivation. It would be helpful if the authors could clarify this conceptual point. A strong claim that the two types of reactivation are mutually exclusive would need to be supported by further evidence, for instance, a metric contrasting sequenceness vs clusteredness.

    On the same point, the non-sequential reactivation analyses often use a time window of peak decodability that appears to be determined based on the average reactivation of all future items, irrespective of graph distance. In a sequential forward cascade of reactivations, it seems reasonable to assume that the reactivation of near items would peak earlier than the reactivation of far items. The manuscript would be strengthened by showing the "raw" timecourses of item decodability at different graph distances, clearly demonstrating their peak reactivation times.

  11. Reviewer #2 (Public Review):

    Summary:
    The authors investigate replay (defined as sequential reactivation) and clustered reactivation during retrieval of an abstract cognitive map. Replay and clustered reactivation were analysed based on MEG recordings combined with a decoding approach. While the authors state to find evidence for both, replay and clustered reactivation during retrieval, replay was exclusively present in low performers. Further, the authors show that reactivation strength declined with an increasing graph distance.

    Strengths:
    The paper raises interesting research questions, i.e., replay vs. clustered reactivation and how that supports retrieval of cognitive maps. The paper is well-written, well-structured, and easy to follow. The methodological approach is convincing and definitely suited to address the proposed research questions.

    The paper is a great combination between replicating previous findings (Wimmer et al. 2020) with a new experimental approach but at the same time presenting novel findings (reactivation strength declines as a function of graph distance).
    What I also want to positively highlight is their transparency. They pre-registered this study but with a focus on a different part of the data and outlined this explicitly in the paper.

    The paper has very interesting, individual findings but there are some shortcomings.

    Weaknesses:
    Even though the individual findings are interesting, it is not easy to grasp how they are related. For example, the authors show that replay is present in low but not in high performers with the assumption that high performers tend to simultaneously reactivate items. But then, the authors do not investigate clustered reactivation (= simultaneous reactivation) as a function of performance (due to ceiling effects for most participants).

    Unfortunately, the evidence for clustered reactivation is not well supported by the analysis approach and the observed evidence. The analysis approach still holds the possibility of replay driving the observed clustered reactivation effect.

    A third shortcoming is that at least some analyses are underpowered (very low number of trials, n = ~10, and for some analyses, very low number of participants, n = 14). In both cases (low trial number and low participant number) the n could be increased by including the learning part in the analyses as well. It is not clear to me why the authors restricted their analyses to the retrieval period only (especially given that participants also have to retrieve during learning).