Neural Correlates and Reinstatement of Recent and Remote Memory: A Comparison Between Children and Young Adults

    This paper provides potentially useful insight into why memory consolidation may differ between children (5-7 years of age) and adults. The work hints at developmental differences in neural engagement during the retrieval of recent and remote memories. However, there are several major issues with the experimental design and analyses that render the evidence supporting the authors' main claims incomplete.

Children showed less robust memory consolidation across short and long delay compared to young adults.From short to long delay, children show differential neural upregulation for remote versus recent memory compared to young adults.Over time, both children and young adults showed reduced scene-specific reinstatement of neural patterns.Children relied more on gist-like neural reinstatement in anterior hippocampal and medial prefrontal brain regions.Memory consolidation tends to be less robust in childhood than adulthood. However, little is known about the corresponding functional differences in the developing brain that may underlie age-related differences in retention of memories over time. This study examined system-level memory consolidation of object-scene associations after learning (immediate delay), one night of sleep (short delay), as well as two weeks (long delay) in 5-to-7-year-old children (n = 49) and in young adults (n = 39), as a reference group with mature consolidation systems. Particularly, we characterized how functional neural activation and reinstatement of neural patterns change over time, assessed by functional magnetic resonance imaging combined with representational similarity analysis (RSA). Our results showed that memory consolidation in children was less robust and strong (i.e., more forgetting) compared to young adults. Contrasting correctly retained remote versus recent memories across time delay, children showed less upregulation in posterior parahippocampal gyrus, lateral occipital cortex, and cerebellum than adults. In addition, both children and adults showed decrease in scene-specific neural reinstatement over time, indicating time-related decay of detailed differentiated memories. At the same time, we observed more generic gist-like neural reinstatement in medial-temporal and prefrontal brain regions uniquely in children, indicating qualitative difference in memory trace in children. Taken together, 5-to-7-year-old children, compared to young adults, show less robust memory consolidation, possibly due to difficulties in engaging in differentiated neural reinstatement in neocortical mnemonic regions during retrieval of remote memories, coupled with relying more on gist-like generic neural reinstatement.

  2. Reviewer #1 (Public Review):


    This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, parahippocampal gyrus, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though still in this revised paper I have substantive concerns about how the analyses were performed. While scene-specific reinstatement decreased for remote memories in both children and adults, claims about its presence cannot be made given the analyses. Gist-level reinstatement was observed in children but not adults, but I also have concerns about this analysis. Broadly, the behavioural and univariate findings are consistent with the idea memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.


    The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.


    As noted above and in my review of the original submission, the pattern similarity analysis for both item and category-level reinstatement were performed in a way that is not interpretable given concerns about temporal autocorrelation within scanning run. Unfortunately these issues remain of concern in this revision because they were not rectified. Most of my review focuses on this analytic issue, though I also outline additional concerns.

    (1) The pattern similarity analyses are largely uninterpretable due to how they were performed.

    (a) First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, and which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, which is not possible given the design.

    To remedy this, in the revision the authors have said they will refrain from making conclusions about the presence of scene-specific reinstatement (i.e., reinstatement above baseline). While this itself is an improvement from the original manuscript, I still have several concerns. First, this was not done thoroughly and at times conclusions/interpretations still seem to imply or assume the presence of scene reinstatement (e.g., line 979-985, "our research supports the presence of scene-specific reinstatement in 5-to-7-year-old children"; line 1138). Second, the authors' logic for the neural-behavioural correlations in the PLSC analysis involved restricting to regions that showed significant reinstatement for the gist analysis, which cannot be done for the analogous scene-specific reinstatement analysis. This makes it challenging to directly compare these two analyses since one was restricted to a small subset of regions and only children (gist), while scene reinstatement included both groups and all ROIs. Third, it is also unclear whether children and adults' values should be directly comparable given pattern similarity can be influenced by many factors like motion, among other things.

    My fourth concern with this analysis relates to the lack of regional specificity of the effects. All ROIs tested showed a virtually identical pattern: "Scene-specific reinstatement" decreased across delays, and was greater in children than adults. I believe control analyses are needed to ensure artifacts are not driving these effects. This would greatly strengthen the authors' ability to draw conclusions from the "clean" comparison of day 1 vs. day 14. (A) The authors should present results from a control ROI that should absolutely not show memory reinstatement effects (e.g., white matter?). Results from the control ROI should look very different - should not differ between children and adults, and should not show decreases over time. (B) Do the recent items from day 1 vs. day 14 differ? If so, this could suggest something is different about the later scans (and if not, it would be reassuring). (C) If the same analysis was performed comparing the object cue and immediately following fixation (rather than the fixation and the immediately following scene), the results should look very different. I would argue that this should not be an index of reinstatement at all since it involves something presented visually rather than something reinstated (i.e., the scene picture is not included in this comparison). If this control analysis were to show the same effects as the primary analysis, this would be further evidence that this analysis is uninterpretable and hopelessly confounded.

    (b) For the category-based neural reinstatement: (1) This suffers from the same issue of correlations being performed within run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). The authors in their response letter have indicated that because the patterns being correlated are not derived from events in close temporal proximity, they should not suffer from the issue of temporal autocorrelation. This is simply not true. For example, see the paper by Prince et al. (eLife 2022; on GLMsingle). This is not the main point of Prince et al.'s paper, but it includes a nice figure that shows that, using standard modelling approaches, the correlation between (same-run) patterns can be artificially elevated for lags as long as ~120 seconds (and can even be artificially reduced after that; Figure 5 from that paper) between events. This would affect many of the comparisons in the present paper. The cleanest way to proceed is to simply drop the within-run comparisons, which I believe the authors can do and yet they have not. Relatedly, in the response letter the authors say they are focusing mainly on the change over time for reinstatement at both levels including the gist-type reinstatement; however, this is not how it is discussed in the paper. They in fact are mainly relying on differences from zero, as children show some "above baseline" reinstatement while adults do not, but I believe there were no significant differences over time (i.e., the findings the authors said they would lean on primarily, as they are arguably the most comparable). (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. In their response letter and the revised paper, the authors do provide a bit of reasoning as to why this is the most sensible. However, it is still not clear to me whether this is really "reinstatement" which (in my mind) entails the re-evoking of a neural pattern initially engaged during perception. Rather, could this be a shared neural state that is category specific? In any case, I think additional information should be added to the text to clarify that this definition differs from others in the literature. The authors might also consider using some term other than reinstatement. Again (as I noted in my prior review), the finding of no category-level reinstatement in adults is surprising and confusing given prior work and likely has to do with the operationalization of "reinstatement" here. I was not quite sure about the explanation provided in the response letter, as category-level reinstatement is quite widespread in the brain for adults and is robust to differences in analytic procedures etc. (3) Also from a theoretical standpoint-I'm still a bit confused as to why gist-based reinstatement would involve reinstatement of the scene gist, rather than the object's location (on the screen) gist. Were the locations on the screen similar across scene backgrounds from the same category? It seems like a different way to define memory retrieval here would be to compare the neural patterns when cued to retrieve the same vs. similar (at the "gist" level) vs. different locations across object-scene pairs. This is somewhat related to a point from my review of the initial version of this manuscript, about how scene reinstatement is not necessary. The authors state that participants were instructed to reinstate the scene, but that does not mean they were actually doing it. The point that what is being measured via the reinstatement analyses is actually not necessary to perform the task should be discussed in more detail in the paper.

    (2) Inspired by another reviewer's comment, it is unclear to me the extent to which age group differences can be attributed to differences in age/development versus memory strength. I liked the other reviewer's suggestions about how to identify and control for differences in memory strength, which I don't think the authors actually did in the revision. They instead showed evidence that memory strength does seem to be lower in children, which indicates this is an interpretive confound. For example, I liked the reviewer's suggestion of performing analyses on subsets of participants who were actually matched in initial learning/memory performance would have been very informative. As it is, the authors didn't really control for memory strength adequately in my opinion, and as such their conclusions about children vs. adults could have been reframed as people with weak vs. strong memories. This is obviously a big drawback given what the authors want to conclude. Relatedly, I'm not sure the DDM was incorporated as the reviewer was suggesting; at minimum I think the authors need to do more work in the paper to explain what this means and why it is relevant. (I understand putting it in the supplement rather than the main paper, but I still wanted to know more about what it added from an interpretive perspective.)

    (3) Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. report difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). Precuneus also interestingly seems to show numerically recent>remote (values mostly negative), whereas most other regions show the opposite. This difference from zero (in either direction) or lack thereof seems important to the message. In response to this comment on the original manuscript, the authors seem to have confirmed that hippocampal activity was greater during retrieval than implicit baseline. But this was not really my question - I was asking whether hippocampus is (and other ROIs in this same figure are) differently engaged for recent vs. remote memories.

    (4) Related to point 3, the claims about hippocampus with respect to multiple trace theory feel very unsupported by the data. I believe the authors want to conclude that children's memory retrieval shows reliance on hippocampus irrespective of delay, presumably because this is a detailed memory task. However the authors have not really shown this; all they have shown is that hippocampal involvement (whatever it is) does not vary by delay. But we do not have compelling evidence that the hippocampus is involved in this task at all. That hippocampus is more active during retrieval than implicit baseline is a very low bar and does not necessarily indicate a role in memory retrieval. If the authors want to make this claim, more data are needed (e.g., showing that hippocampal activity during retrieval is higher when the upcoming memory retrieval is successful vs. unsuccessful). In the absence of this, I think all the claims about multiple trace theory supporting retrieval similarly across delays and that this is operational in children are inappropriate and should be removed.

    (5) There are still not enough methodological details in the main paper to make sense of the results. Some of these problems were addressed in the revision but others remain. For example, a couple of things that were unclear: that initially learned locations were split, where half were tested again at day 1 and the other half at day 14; what specific criterion was used to determine to pick the 'well-learned' associations that were used for comparisons at different delay periods (object-scene pairs that participants remembered accurately in the last repetition of learning? Or across all of learning?).

    (6) In still find the revised Introduction a bit unclear. I appreciated the added descriptions of different theories of consolidation, though the order of presented points is still a bit hard to follow. Some of the predictions I also find a bit confusing as laid out in the introduction. (1) As noted in the paper multiple trace theory predicts that hippocampal involvement will remain high provided memories retained are sufficiently high detail. The authors however also predict that children will rely more on gist (than detailed) memories than adults, which would seem to imply (combined with the MTT idea) that they should show reduced hippocampal involvement over time (while in adults, it should remain high). However, the authors' actual prediction is that hippocampus will show stable involvement over time in both kids and adults. I'm having a hard time reconciling these points. (2) With respect to the extraction of gist in children, I was confused by the link to Fuzzy Trace Theory given the children in the present study are a bit young to be showing the kind of gist extraction shown in the Brainerd & Reyna data. Would 5-7 year olds not be more likely to show reliance on verbatim traces under that framework? Also from a phrasing perspective, I was confused about whether gist-like information was something different from just gist in this sentence: "children may be more inclined to extract gist information at the expense of detailed or gist-like information." (p. 8) - is this a typo?

    (7) For the PLSC, if I understand this correctly, the profiles were defined for showing associations with behaviour across age groups. (1) As such, is it not "double dipping" to then show that there is an association between brain profile and behaviour-must this not be true by definition? If I am mistaken, it might be helpful to clarify this in the paper. (2) In addition, I believe for the univariate and scene-specific reinstatement analyses these profiles were defined across both age groups. I assume this doesn't allow for separate definition of profiles across the two group (i.e., a kind of "interaction"). If this is the case, it makes sense that there would not be big age differences... the profiles were defined for showing an association across all subjects. If the authors wanted to identify distinct profiles in children and adults they may need to run another analysis. (3) Also, as for differences between short delay brain profile and long delay brain profile for the scene-specific reinstatement - there are 2 regions that become significant at long delay that were not significant at a short delay (PC, and CE). However, given there are ceiling effects in behaviour at the long but not short delay, it's unclear if this is a meaningful difference or just a difference in sensitivity. Is there a way to test whether the profiles are statistically different from one another? (4) As I mentioned above, it also was not ideal in my opinion that all regions were included for the scene-specific reinstatement due to the authors' inability to have an appropriate baseline and therefore define above-chance reinstatement. It makes these findings really challenging to compare with the gist reinstatement ones.

    (8) I would encourage the authors to be specific about whether they are measuring/talking about memory representations versus reinstatement, unless they think these are the same thing (in which case some explanation as to why would be helpful). For example, especially under the Fuzzy Trace framework, couldn't someone maintain both verbatim and gist traces of a memory yet rely more on one when making a memory decision?

    (9) With respect to the learning criteria - it is misleading to say that "children needed between two to four learning-retrieval cycles to reach the criterion of 83% correct responses" (p. 9). Four was the maximum, and looking at the Figure 1C data it appears as though there were at least a few children who did not meet the 83% minimum. I believe they were included in the analysis anyway? Please clarify. Was there any minimum imposed for inclusion?

    (10) For the gist-like reinstatement PLSC analysis, results are really similar a short and long delays and yet some of the text seems to implying specificity to the long delay. One is a trend and one is significant (p. 31), but surely these two associations would not be statistically different from one another?

    (11) As a general comment, I had a hard time tying all of the (many) results together. For example adults show more mature neocortical consolidation-related engagement, which the authors say is going to create more durable detailed memories, but under multiple trace theory we would generally think of neocortical representations as providing more schematic information. If the authors could try to make more connections across the different neural analyses, as well as tie the neural findings in more closely with the behaviour & back to the theoretical frameworks, that would be really helpful.

  3. Reviewer #2 (Public Review):

    Schommartz et al. present a manuscript characterizing neural signatures of reinstatement during cued retrieval of middle-aged children compared to adults. The authors utilize a paradigm where participants learn the spatial location of semantically related item-scene memoranda which they retrieve after short or long delays. The paradigm is especially strong as the authors include novel memoranda at each delayed time point to make comparisons across new and old learning. In brief, the authors find that children show more forgetting than adults, and adults show greater engagement of cortical networks after longer delays as well as stronger item-specific reinstatement. Interestingly, children show more category-based reinstatement, however, evidence supports that this marker may be maladaptive for retrieving episodic details. The question is extremely timely both given the boom in neurocognitive research on the neural development of memory, and the dearth of research on consolidation in this age group. Also, the results provide novel insights into why consolidation processes may be disrupted in children.

  4. Author response:

    The following is the authors’ response to the original reviews.

    Reviewer #1 (Public Reviews):


    This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, parahippocampal gyrus, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though I have substantive concerns about how this analysis was performed and as such will not summarize the results. Broadly, the behavioural and univariate findings are consistent with the idea that memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.


    The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.


    As noted above, the pattern similarity analysis for both item and category-level reinstatement was performed in a way that is not interpretable given concerns about temporal autocorrelation within the scanning run. Below, I focus my review on this analytic issue, though I also outline additional concerns.

    We thank the reviewer for both the positive and critical appraisal of our paper.

    (1) The pattern similarity analyses were not done correctly, rendering the results uninterpretable (assuming my understanding of the authors' approach is correct).

    a. First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within the scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, but I don't believe this is possible unfortunately given the authors' design; I believe the target (presumably reinstated) scene only appears once during scanning, so there is no separate neural pattern during the presentation of this picture that they can use. For these reasons, any evidence for "significant scene-specific reinstatement" and the like is completely uninterpretable and would need to be removed from the paper.

    We thank the reviewer for this important input. We acknowledge that our study design leads to temporal autocorrelation in the BOLD signal when calculating RSA between fixation and scene time windows. We also recognize that we cannot interpret the significance of scene-specific reinstatement compared to zero and have accordingly removed this information. Nevertheless, our primary objective was to investigate changes in scene-specific reinstatement in relation to the different time delays of retrieval. Given that the retrieval procedure is the same over time and presumably similarly influenced by temporal autocorrelations, we argue that our results must be attributed to the relative differences in reinstatement across recent and remote trials. Bearing this in mind, we argue that our results can be interpreted in terms of delay-related changes in reinstatement. This information is discussed in pp. 21, 40 of the manuscript.

    We agree with the reviewer that cross-run comparisons would be extremely interesting. This could be achieved by introducing the same items repeatedly across different runs, which was not possible in our current setup since we were interested in single exposure retrieval and practical time restriction in scanning children. We have introduced this idea in Limitations and Discussion sections (pp. 40, 44) of the manuscript to inform future studies.

    Finally, thanks to the reviewer’s comment, we identified a bug in the final steps of our RSA calculation. Fischer’s z-transformation was incorrectly applied to r-1 values, resulting in abnormally high values. We apologize for this error. We have revised the scripts and rectified the bug by correctly applying Fischer’s z-transformation to the r similarity values. We also adjusted the methods description figure accordingly (Figure 5, p. 22). This adjustment led to slightly altered reinstatement indices. Nevertheless, the overall pattern of delay-related attenuation in the scene-specific reinstatement index, observed in both children and adults, remains consistent. Similarly, we observed gist-like reinstatement uniquely in children.

    b. From a theoretical standpoint, I believe the way this analysis was performed considering the fixation and the immediately following scene also means that the differences between recent and remote could have to do with either the reactivation (processes happening during the fixation, presumably) or differences in the processing of the stimulus itself (happening during the scene presentation). For example, people might be more engaged with the more novel scenes (recent) and therefore process those scenes more; such a difference would be interpreted in this analysis as having to do with reinstatement, but in fact could be just related to the differential scene processing/recognition, etc.

    Thank you for your insightful comments. We acknowledge the theoretical concerns raised about distinguishing between the effects of reactivation processes occurring during fixation and differential processing of the stimulus itself during scene presentation. Specifically, the notion that engagement levels with recent scenes could result in enhanced processing, which might be misattributed to memory reinstatement mechanisms.

    We argue, however, that during scene presentation, scenes are processed more “memory-wise” rather than “perception-wise”, since both recent and remote memories are well-learned, as we included only correctly recalled memories in the analysis.

    We concur that scene presentations entail perceptual processing; however, such processing would be consistent across all items, given that they were presented with the same repeated learning procedure, rendering them equally familiar to participants. In addition, we would argue that distinct activation patterns elicited during varying delays are more likely attributable to memory-related processing, since participants actively engaged in a memory-based decision-making task during these intervals. We have incorporated this rationale into the discussion section of our manuscript (p. 40).

    With this in mind, we hypothesized that in case of “memory-wise” processing, the neural engagement during the scene time window should be higher for remote compared to recent items, and this increases with passing time as more control and effort should be exhibited during retrieval due to reorganized and distributed nature of memories. If the scenes are processed more “perception-wise”, we would expect higher neural engagement during the retrieval of recent compared to remote items. Our exploratory analysis (detailed overview in supplementary materials, Figure S3, Table S9) revealed a higher neural activation for remote compared to recent items in medial temporal, prefrontal, occipital and cerebellar brain regions, supporting the notion of “memory-wise” processes during scene time window. However, this exploratory analysis cannot provide a direct solution to the reviewer’s concern as our paradigm per se cannot arbitrate between “memory-wise” and “perception-wise” nature of retrieval. We added the point to the discussion (see p. 40).

    c. For the category-based neural reinstatement:

    (1) This suffers from the same issue of correlations being performed within the run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). With this restriction, it may or may not be possible to perform this analysis, depending upon how the same-category scenes are distributed across runs. However, there are other issues with this analysis, as well.

    (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. The authors do not motivate the reason for this switch. Please provide reasoning as to why fixation-fixation is more appropriate than fixation-scene similarity for category-level reinstatement, particularly given the opposite was used for item-level reinstatement. Even if the analyses were done properly, it would remain hard to compare them given this difference in approach.

    (3) I believe the fixation cross with itself is included in the "within category" score Is this not a single neural pattern correlated with itself, which will yield maximal similarity (pearson r=1) or minimal dissimilarity (1-pearson r=0)? Including these comparisons in the averages for the within-category score will inflate the difference between the "within-category" and "between-category" comparisons. These (e.g., forest1-forest1) should not be included in the within-category comparisons considered; rather, they should be excluded, so the fixations are always different but sometimes the comparisons are two retrievals of the same scene type (forest1-forest2), and other times different scene types (forest1-field1)

    (4) It is troubling that the results from the category reinstatement metric do not seem to conceptually align with past work; for example, a lot of work has shown category-level reinstatement in adults. Here the authors do not show any category-level reinstatement in adults (yet they do in children), which generally seems extremely unexpected given past work and I would guess has to do with the operationalization of the metric.

    Thank you for this important input regarding category-based reinstatement.

    (1) The distribution of within-category items across runs was approximately similar and balanced. Additionally, within runs, they were presented randomly without close temporal proximity. Based on this arrangement, we believe that the issue of close temporal autocorrelation, as pointed out by the reviewer in the context of scene-specific reinstatement, may not apply to the same extent here. Again, our focus is not on the absolute level of category-based reinstatement, but the relative difference across conditions (recent vs. remote short delay vs. remote long delay) which are equally impacted by the autocorrelations.

    (2) We apologize for not motivating this analysis further. Whereas the scene-reinstatement index (i.e., fixation to scene correlation) gives us a measure of the pre-activation of a concrete scene (e.g., a yellow forest in autumn), the gist-like reinstatement gives us a measure of the pre-activation of a whole category of scenes (e.g., forests). Critically, our window of interest is the fixation period for both sets of analysis (in the absence of any significant visual input). The scene-specific reinstatement uses the scene window as a neural template against which the fixation period can be compared, while the gist-like reinstatement compares similarity of reactivation pattern for trials from the same category but differ in the exact memory content. The reinstatement of more generic, gist-like memory (e.g., forest) across multiple trials should yield more similar neural activation patterns. Significant gist-like reinstatement would suggest that neural patterns for scenes within the same category are more generic, as indicated by higher similarity among them. On the other hand, a more detailed reinstatement of specific types of forests (e.g., a yellow forest in autumn, green pine trees, a bare-leaved forest in spring, etc.) that differ in various dimensions could result in neural activation patterns that are as dissimilar as those seen in the reinstatement of scenes from entirely different categories. Through this methodology, we could distinguish between more generic, gist-like reinstatement and more specific, detailed reinstatement. This is now clarified in the manuscript, see p. 25.

    (3) We apologize for the confusion caused by the figure and analysis description. In our analysis, we indeed excluded the correlation of the fixation cross with itself. Consequently, the diagonal in the figure should be blank to indicate this. This is now revised in the manuscript (Figure 7B and in Methods).

    (4) We appreciate your concern and recognize that the terminology we used might not align perfectly with the conventional understanding of category-based reinstatement. Typically, category-level neural representations (as discussed in Polyn et al., 2005; Jafarpour et al., 2014; among others) are investigated to identify specific brain areas associated with encoding/perception of scenes or faces. Our aim, however, was to explore the mnemonic reinstatement of highly detailed scenes that were elaborately encoded, with the hypothesis that substantial representational transformations would occur over time and vary with age. This hypothesis is based on the memory literature, including the Fuzzy-Trace Theory, the Contextual Binding Theory, and the Trace Transformation Theory (Brainerd & Reyna, 1998; Yonelinas, 2019; Moscovitch & Gilboa, 2023). Therefore, we renamed 'category-based' reinstatement to 'gist-like' reinstatement, which clarifies our concept and better aligns it with existing literature.

    We anticipated that young adults, having the ability to retain detailed narratives post-encoding, would demonstrate a reinstatement of scenes with distinct details, making these scenes dissimilar from each other (see similar findings in Sommer et al., 2021). In contrast, given the anticipated lesser strategic elaboration during learning in children, we hypothesized that they would demonstrate a shallower, more gist-like reinstatement (for instance, children recalling a forest or a field in a general sense without specific details or vivid imagery). This could result in higher category-based similarity, as children might reinstate a more generic forest concept.

    We did not gather additional data on the verbal quality of reinstatement due to the limited scanning time available for children, so these assumptions remain unverified. However, anecdotal observations post-retrieval indicated that adults often reported very vivid scenes associated with clear narrative recall. In contrast, children frequently described more vague memories (e.g., “I know it was a forest”) without specific details. Future studies should include measures to assess the quality of reinstatement, potentially outside the scanning environment.

    (2) I did not see any compelling statistical evidence for the claim of less robust consolidation in children.

    Specifically in terms of the behavioral results of retention of the remote items at 1 vs 14 days, shown in Figure 2B, the authors conclude that memory consolidation is less robust in children (line 246). Yet they do not report statistical evidence for this point, as there was no interaction of this effect with the age group. Children had worse memory than adults overall (in terms of a main effect - i.e. across recent and remote items). If it were consolidation-specific, one would expect that the age differences are bigger for the remote items, and perhaps even most exaggerated for the 14-day-old memories. Yet this does not appear to be the case based on the data the authors report. Therefore, the behavioral differences in retention do not seem to be consolidation specific, and therefore might have more to do with differences in encoding fidelity or retrieval processes more generally across the groups. This should be considered when interpreting the findings.

    Thank you for highlighting this important issue. We acknowledge that our initial description and depiction of our behavioral findings may not have effectively conveyed the main message about memory consolidation. Therefore, we have revised the behavioral results section (see pp. 12-14) to communicate our message more clearly.

    As detailed in the methods section, we reported retention rates only for those items that were correctly (100%) learned on day 0, day 1, and day 14. This approach meant that different participants had varying numbers of items learned correctly. However, this strategy allowed us to address our primary question: whether memory consolidation, based on all items initially encoded successfully, is comparably robust between groups.

    To illustrate the change in retention rate slopes over time for recently learned items (i.e., immediately 30 minutes after learning), short delay remote, and long delay remote items, relative to the initially correctly learned items more clearly and straightforward, we conducted the following analysis: after observing no differences between sessions in both age groups for recent items on days 1 and 14, we combined the recent items. This approach enabled us to investigate how the slope of memory retention for initially correctly learned items (with a baseline of 100%) changes over time. We observed a significant interaction between item type (recent, short delay remote, long delay remote) and group (F(3,250) = 17.35, p < .001, w2 = .16). The follow up of this interaction revealed significantly less robust memory consolidation across all delay times in children compared to young adults. This information is added in the manuscript in pp. 12-14. We have also updated the figures, incorporating the baseline of 100% correct performance.

    (3) Please clarify which analyses were restricted to correct retrievals only. The univariate analyses states that correct and incorrect trials were modelled separately but does not say which were considered in the main contrast (I assume correct only?). The item specific reinstatement analysis states that only correct trials were considered, but the category-level reinstatement analysis does not say. Please include this detail.

    Thank you for bringing this to our attention. We indeed limited our analysis – including univariate, specific reinstatement, and gist-like analyses – to only correctly remembered items. This decision was made because our goal was to observe delay-related changes in the neural correlates of correct memories, which are potentially stronger. We have incorporated this information into the manuscript.

    (4) To what extent could performance differences be impacting the differences observed across age groups? I think (see prior comment) that the analyses were probably limited to correct trials, which is helpful, but still yields pretty big differences across groups in terms of the amount of data going into each analysis. In general, children showed more attenuated neural effects (e.g., recent/remote or session effects); could this be explained by their weaker memory? Specifically, if only correct trials are considered that means that fewer trials would be going into the analysis for kids, especially for the 14-day remote memories, and perhaps pushing the remove > recent difference for this condition towards 0. The authors might be able to address this analytically; for example, does the remote > recent difference in the univariate data at day 14 correlate with day 14 memory?

    Thank you for pointing this out. Indeed, there was a significant relationship between remote > recent difference in the univariate data and memory performance at day 14 across both age group (see Figure 4C-D). The performance of all participants including children was above chance level for remote trial on day 14. In addition, although number of remote trials was lower in children (18 trials on average) in comparison to adults (22 trials on average), we believe that the number of remote trials was not too low or different across groups for the contrast.

    (5) Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. report difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example, in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). This difference from zero or lack thereof seems important to the message - is that correct? If so, can the authors incorporate descriptions of these findings?

    Thank you for this valuable input. When examining recent and remote retrieval separately, indeed both the anterior and posterior regions of the hippocampus exhibited significant activation from zero in adults (all p < .0003FDRcorr) and children (all p < .014FDRcorr, except for recent posterior hippocampus) during all delays. We include this information in the manuscript (see p. 17) and add it to the supplementary materials (Figure S2, Table S7).

    (6) Please provide more details about the choices available for locations in the 3AFC task. (1) Were they different each time, or always the same? If they are always the same, could this be a motor or stimulus/response learning task? (2) Do the options in the 3AFC always come from the same area - in which case the participant is given a clue as to the gist of the location/memory? Or are they sometimes randomly scattered across the image (in which case gist memory, like at a delay, would be sufficient for picking the right option)? Please clarify these points and discuss the logic/impact of these choices on the interpretation of the results.Response: Thank you for pointing this out. During learning and retrieval, we employed the 3AFC (Three-Alternative Forced Choice) task.

    The choices for locations varied across scenes while remained the same across time within individuals. There were 18 different key locations for the objects, distributed across the stimulus set. This means the locations of the objects were quite heterogeneous and differed between objects. The location of the object within the task was presented once during encoding and remained consistent throughout learning. Given the location heterogeneity, we believe our task cannot be reduced to a mere “stimulus/response learning task” but is more accurately described as an object-location associations task.

    Similar to the previous description, the options for the 3AFC task did not originate from the same area, as there were 18 different areas in total. The three choice options were distributed equally: so sometimes the “correct” answer was the left option, sometimes in the middle option, or sometimes the right option. Therefore, we believe that the 3AFC task did not provide clues to the location but required detailed and precise memory of the location. Moreover, the options were not randomly scattered but rather presented close together in the scene, demanding a high level of differentiation between choices.

    Taking all the above into consideration, we assert that precise object-location associative memory is necessary for a correct answer. We have added this information to the manuscript (p. 9).

    (7) Often p values are provided but test statistics, effect sizes, etc. are not - please include this information. It is at times hard to tell whether the authors are reporting main effects, interactions, pairwise comparisons, etc.

    Thank you for bringing this to our attention. We realize that including this information in the Tables may not be the most straightforward approach. Therefore, we have incorporated the test statistics, effect sizes, and related details into the text of the results section for clarity.

    (8) There are not enough methodological details in the main paper to make sense of the results. For example, it is not clear from reading the text that there are new object-location pairs learned each day.

    Thank you for pointing this out. We have added this information to the main manuscript. Additionally, we have emphasized this information in the text referring to Figure 1B.

    (9) The retrieval task does not seem to require retrieval of the scene itself, and as such it would be helpful for the authors to both explain their reasoning for this task to measure reinstatement. Strictly speaking, participants could just remember the location of the object on the screen. Was it verified that children and adults were recalling the actual scene rather than just the location (e.g. via self-report)? It's possible that there may be developmental differences in the tendency to reinstate the scene depending on e.g., their strategy.

    Thank you for highlighting this important point. Indeed, the retrieval task included explicit instructions for participants to recall and visualize the scene associated with the object presented during the fixation time window. Participants were also instructed to recollect the location of the object within the scene. Since the location was contextually bound to the scene and each object had a unique location in each scene, the location of the object was always embedded in the specific scene context. We have added this information to both the Methods and Results sections.

    From the self-reports of the participants (which unfortunately were not systematically collected on all occasions), they indicated that when they could recall the scene and the location due to the memory of stories created during strategic encoding, it aided their memory for the scene and location immensely. We also concur with your observation that children and young adults may differ in their ability to reinstate scenes, depending on the success of their employed recall strategies. This task was conducted with an awareness of potential developmental differences in the ability to form complex contextual memories. Our elaborative learning procedure was designed to minimize these differences. It is important to note though we did not expect children to achieve performance levels fully comparable to adults. There may indeed be developmental differences in reinstatement, such as due to differences in knowledge availability and accessibility (Brod, Werkle-Bergner, & Shing, 2013). We think that these differences may underlie our findings of neural reinstatement. This is now discussed in p. 34-35, 39-43 of the manuscript.

    (10) In general I found the Introduction a bit difficult to follow. Below are a few specific questions I had.

    a. At points findings are presented but the broader picture or take-home point is not expressed directly. For example, lines 112-127, these findings can all be conceptualized within many theories of consolidation, and yet those overarching frameworks are not directly discussed (e.g., that memory traces go from being more reliant on the hippocampus to more on the neocortex). Making these connections directly would likely be helpful for many readers.

    Thank you for bringing this to our attention. We have incorporated a summary of the general frameworks of memory consolidation into the introduction. This addition outlines how our summarized findings, particularly those related to memory consolidation for repeatedly learned information, align with these frameworks (see lines 126-138, 146-150).

    b. Lines 143-153 - The comparison of the Tompary & Davachi (2017) paper with the Oedekoven et al. (2017) reads like the two analyses are directly comparable, but the authors were looking at different things. The Tompary paper is looking at organization (not reinstatement); while the Oedekoven et al. paper is measuring reinstatement (not organization). The authors should clarify how to reconcile these findings.

    Thank you for highlighting this aspect. We have revised how we present the results from Tompary & Davachi (2017). This study examined memory reorganization for memories both with and without overlapping features, and it observed higher neural similarity for memories with overlapping features over time. The authors also explored item-specific reinstatement for recent and remote memories by assessing encoding-retrieval similarity. Since Oedekoven et al. (2017) utilized a similar approach, their results are comparable in terms of reinstatement. We have updated and expanded our manuscript to clarify the parallels between these studies (see lines 157-162).

    c. Line 195-6: I was confused by the prediction of "stable involvement of HC over time" given the work reviewed in the Introduction that HC contribution to memory tends to decrease with consolidation. Please clarify or rephrase.

    Drawing on the Contextual Binding Theory (Yonelinas et al., 2019), as well as the Multiple Trace Theory (Nadel et al., 2000) and supported for instance by evidence from Sekeres et al. (2018), we hypothesized that detailed contextual memories formed through repeated and strategic learning would strengthen the specificity of these memories, resulting in consistent hippocampal involvement for successfully recalled contextualized detailed memories. We have included additional explanatory information in the manuscript to clarify this hypothesis (see lines 217-219).

    d. Lines 200-202: I was a bit confused about this prediction. Firstly, please clarify whether immediate reinstatement has been characterized in this way for kids versus adults. Secondly, don't adults retain gist more over long delays (with specific information getting lost), at least behaviourally? This prediction seems to go against that; please clarify.

    Thank you for raising this important point. Indeed, there are no prior studies that examined memory reinstatement over extended durations in children. The primary existing evidence suggests that neural specificity or patterns of neural representations in children can be robustly observed, while neural selectivity or univariate activation in response to the same stimuli tends to mature later (i.e., Fandakova et al., 2019). Bearing this in mind and recognizing that such neural patterns can be observed in both children and adults, we hypothesized that adults may form stronger detailed contextual memories compared to children. By employing strategies such as creating stories, adults might more easily recall scenes without the need to resort to forming generic or gist-like memories (for example, 'a red fox was near the second left pine tree in a spring green forest'). This assumption aligns with the Fuzzy Trace Theory (Reyna & Brainerd, 1995), which posits that verbatim memories can be created without the extraction of a gist.

    Conversely, we hypothesized that children, due to the ongoing maturation of associative and strategic memory components (as discussed in Shing et al., 2008 and 2010), which are dependent respectively on the hippocampus (HC) and the prefrontal cortex (PFC), would be less adept at creating, retaining, and extracting stories to aid their retrieval process. This could result in them remembering more generic integrated information, like the relationship between a fox and some generic image of a forest. We have added explanatory information to the manuscript to elucidate these points (see lines 225-230).

    Reviewer #1 (Recommendations For The Authors):

    (1) For Figure 3, I would highly recommend changing the aesthetics for the univariate data - at least on my screen they appear to be open boxes with solid vs. dashed lines, and as such look identical to the recent vs. remove distinction in Figure 2B. It also doesn't match the legend for me, which shows the age groups having purple vs. yellow coloring.

    Thank you for this observation. We have adjusted Figure 2 (now Figure 3) (please refer to p. 14) accordingly, now utilizing purple and yellow colors to distinguish between the age groups.

    (2) Lines 329-330, it is not true that "all" indices were significant from zero but this is only apparent if you read the next sentence. Please rephrase to clarify. e.g., "All ... indices with a few exceptions ... were significantly..."?

    Based on the above suggestions and considering our primary focus on time-related changes in scene-specific reinstatement, we will refrain from further interpreting the relative expression of individual scene-specific indices against 0. Consequently, we have removed this information from our analysis.

    (3) It is challenging to interpret some of the significance markers, such as those in Figure 3. For example what effects are being denoted by the asterisks and bars above vs. below the data on panel D? Please clarify and/or note in the legend.

    We have included a note in the legend to clarify the meaning of all significance markers. In addition, we decided to state any significant main and interaction effects in the figure rather that to use significance markers.

    (4) For Figures 2 and 3, only the meaning of error bars is described in the caption. It is not explained in the caption what the boxes, lines, and points denote. Please clarify.

    Thank you for highlighting this. We have added explanations to the figure's annotation for clarity. Please note, that considering other review’s suggestions figure plots may have been adjusted or changed, resulting in adjustment of the explanations in the figure annotation.

    (5) How were recent and remote interspersed relative to one another? The text says that each run had 10 recent and 10 remote pairs, presented in a "pseudo-random order" - not clear what that (pseudo) means in this case. Please clarify.

    Thank you for raising this point. We provide this information in the Methods section “Materials and Procedure”: 'The jitters and the order of presentation for recent and remote items were determined using OptimizeXGUI (Spunt, 2016), following an exponential distribution (Dale, 1999). Ten unique recently learned pairs (from the same testing day) and ten unique remotely learned items (from Day 0) were distributed within each run (in total three runs) in the order as suggested by the software as the most optimal. There were three runs with unique sets of stimuli each resulting in thirty unique recent and thirty unique remote stimuli overall.'

    (6) Figure 1A, second to last screen on the learning cycles row - what would be presented to participants here, one of these three emojis? What does the sleepy face represent? I see some of these points were mentioned in the methods, but additional clarification in the caption would be helpful.

    Thank you for highlighting this. We have included this information in the figure caption. Specifically, the sleepy face symbol in the figure denotes a 'missed response'.

    (7) Not clear how the jittered fixation time between object presentation and scene test is dealt with in representational similarity analyses.

    Thank you for pointing this out. Beta estimates were obtained from a Least Square Separate (LSS) regression model. Each event was modeled with their respective onset and duration and, as such, one beta value was estimated per event (with the lags between events differing from trial to trial). We have edited the corresponding section (see p. 53).

    (8) It was a little bit strange to have used anterior vs posterior HPC ROIs separately in univariate analysis but then combined them for multivariate. There are many empirical and theoretical motivations for looking at item-specific and category reinstatement in anterior and posterior HPC separately, so I was surprised not to see this. Please explain this reasoning.

    Thank you for pointing this out. We agree with the reviewer and included the anterior and posterior HC ROIs into the multivariate analysis. Please see the revised results section (pp. 13-15).

    (9) The term "neural specificity" is introduced (line 164) without explanation; please clarify.

    Thank you for bringing this to our attention. The term ‘neural specificity’ refers to the neural representational distinctiveness of information. In other words, ‘neural specificity,’ as defined by Fandakova et al. (2019), refers to the distinctiveness of neural representations in the regions that process that sensory input. We decided, however to refrain from using this term and instead to use neural representational distinctiveness, which is more self-explaining and was also introduced in the manuscript.

    (10) Age range is specified as 5-7 years initially (line 187) and then 6-7 years (line 188).

    We have corrected the age range in line 188 to '5 to 7 years.'

    Reviewer #2 (Public Reviews):

    Schommartz et al. present a manuscript characterizing neural signatures of reinstatement during cued retrieval of middle-aged children compared to adults. The authors utilize a paradigm where participants learn the spatial location of semantically related item-scene memoranda which they retrieve after short or long delays. The paradigm is especially strong as the authors include novel memoranda at each delayed time point to make comparisons across new and old learning. In brief, the authors find that children show more forgetting than adults, and adults show greater engagement of cortical networks after longer delays as well as stronger item-specific reinstatement. Interestingly, children show more category-based reinstatement, however, evidence supports that this marker may be maladaptive for retrieving episodic details. The question is extremely timely both given the boom in neurocognitive research on the neural development of memory, and the dearth of research on consolidation in this age group. Also, the results provide novel insights into why consolidation processes may be disrupted in children. Despite these strengths, there are quite a few important design and analytical choices that derail my enthusiasm for the paper. If the authors could address these concerns, this manuscript would provide a solid foundation to better understand memory consolidation in children.

    We thank the reviewer for both the positive and critical appraisal of our paper.

    Reviewer #2 (Recommendations For The Authors):

    (1) My greatest concern is the difference in memory accuracy that emerges as soon as immediate learning, which undermines the interpretation of any consolidation-related differences. This concern is two-fold. The authors utilize an adaptive learning approach in which participants learn to criteria or stop after 4 repetitions. This type of approach leads to children seeing the stimuli more often during learning compared to adults, which on its own could have consequences for consolidation-related neural markers. Specifically, within adults theoretical and empirical work this shows that repeating information can actually lead to more gist-like representations, which is the exact profile the children are showing. While there could be a strength to this approach because it allows for equivocal memory, the decision to stop repetitions before criteria means that memory performance is significantly lower in the children, which again could have consequences to consolidation-related neural markers. First, the authors do not show any of the learning-related data which would be critical to assess the impact of this design choice. Second, there are likely differences in memory strength at the delay, making it extremely difficult to determine if the neural markers reflect development, worse memory strength, or both. This issue is compounded by the use of a 3-AFC paradigm, wherein "correct responses" included in the analysis could contain a significant amount of guessing responses. I think a partial solution to this problem is to analyze the RT data and include them in the analyses or use a drift-diffusion modeling approach to get more precise estimates of memory strength to control for this feature. An alternative is to sub-select participants in each group to have a sample matched on performance (including # of repetitions) and re-run all the analyses in this sub-sample. Without addressing these concerns it is near impossible to interpret the presented data.

    Thank you for highlighting this point.

    Firstly, we believe that our approach, involving strategic and repeated learning coupled with feedback, enhances the formation of detailed contextual memories. The retrieval procedure also emphasized the need for detailed memory for location. These are critical differences in experimental procedure from previous studies, which enhanced the importance of detailed representations and likely reduced the likelihood of forming gist-like memories.

    Indeed, we ceased further learning after the fourth repetition. Extensive piloting, where we initially stopped after the seventh repetition, showed no improvement beyond the fourth repetition. In fact, performance tended to decline due to fatigue. Therefore, we limited the number of repetition cycles to the point where an improvement of performance was still feasible. Even though children exhibited lower final learning performance overall, we believe our procedure facilitated them to reach their maximal performance within the experimental setup.

    To address the reviewer’s concern, we included learning data to illustrate the progression of learning (see Fig. 1C, pp. 9-10 in Results).

    When interpreting the retention rates, it is important to note that we reported retention rates only for items that were correctly learned (100%) on day 0, day 1, and day 14. This approach meant that different participants had varying numbers of items learned correctly. However, this method enabled us to address our primary question: whether memory consolidation, based on all items initially encoded successfully, is comparably robust between the groups. To simultaneously examine the change in retention rate slopes over time for recent (30 minutes after learning), short delay (one night after) remote, and long delay (two weeks after) remote items, we conducted a separate analysis of retention rates for recent items on days 1 and 14. After observing no differences between sessions in both age groups, we combined the data for recent items. This allowed us to investigate how the slope of memory retention for initially correctly learned items (with a baseline of 100%) changes over time. We observed a significant interaction between item type (recent, short delay remote, long delay remote) and group. Analysis of this interaction revealed significantly less robust memory consolidation across all delay times for children compared to young adults. The figures have been adjusted accordingly to incorporate the baseline of 100% correct performance.

    Following your suggestion, we also employed the drift diffusion model approach to characterize memory strength, calculating drift rate, boundary and non-decision time parameters. We added the results to the Supplementary Materials (section S2.1, Figure S1).

    Generally, our findings indicate lower overall drift rate in children when considering all items that had to be learned. We also observed that adults show higher slope of decline in drift rate in short and long delay, which, however, are characterized still by higher memory strength compared to children. Both age groups required similar amount of evidence to make decision, which declined with delay. It may indicate an adaptation of weaker memory. Further, we observed lesser non-decision time in children compared to adults, potentially suggesting less error checking or less thorough processing and memory access through strategy in children.

    Overall, these results indicate weaker memory strength in children as a quantitative measure. It may nevertheless stem from qualitatively different memory representations that children form, as our RSA findings suggest. We believe that our neural effect reflects the effect of interest (i.e., worse memory due to lower memory strength in children). When controlled for, it will take away variance of interest in the neural data. Therefore, we will refrain from including memory strength into the model. However, we will include mean RT as the indicator of general response tendencies.

    Given that the paper is already very complex and long, we opted to add the diffusion model results to the Supplementary Materials (section S2.1, Fig. S1), while discussing the results in the discussion (p. 35).

    (2) More discussion of the behavioral task should be included in the results, in particular the nature of the adaptive learning paradigm including the behavioral results as well as the categorical nature of the memoranda. Without this information, it is difficult for the reader to understand what category-level versus item-level reinstatement reflects.

    Thank you for this valuable input. We have incorporated this information into the results section. Please refer to pp. 9-10, 12, 14, 21, 25-26 for the added details.

    (3) Some of the methods for the reinstatement analysis were unclear to me or warranted further adjustment. I believe the authors compared the scene against all other scenes. I believe it would be more appropriate to only compare this against scenes drawn from the same category as opposed to all scenes. Secondly, from my reading, it seems like the reinstatement was done during the scene presentation, rather than the object presentation in which they would retrieve the scene. I believe the reinstatement results would be much stronger if it was captured during the object presentation rather than the re-presentation of the scene. Or perhaps both sets of analyses should be included.

    We apologize for the confusion regarding the analysis method.

    During the review process we have improved the description of this analysis and hope it is easier to follow now. In short, we used both approaches (within and between categories) to suit different goals (I.e., measuring scene-reinstatement and gist-like reinstatement).

    Both types of reinstatement were assessed during the fixation cross to avoid confounds with the object itself being on the screen. We only used the scene window in one analysis (scene-reinstatement index) as a neural template to track its pre-activation during the fixation. So, as the reviewer suggests, our rationale is that the reinstatement indeed starts taking place at the short object presentation window, but importantly, extends to the fixation window. We added this clarifying information to the results section (see p. 21-27).

    (4) For the univariate results, it was unclear to me when reading the results whether they were focusing on the object presentation portion of the trial or the scene presentation portion of the trial. Again, I think the claims of reinstatement related activity would be stronger if they accounted for the object presentation period.

    Thank you for pointing this out. Indeed, the univariate results were based on the object presentation time window. We added this information to the results section (Fig. 3, pp. 14, 16).

    (5) Further, given the univariate differences shown across age groups, the authors should re-run all analyses for the RSA controlling for mean activation within the ROI.

    Thank you for highlighting this. We re-ran all analysis for the RSA controlling for the mean activation within the ROI. The results remained unchanged. We have added this information to the results section as well as in Table S8 and S11 in the Supplementary Materials for further details.

    (6) The authors should include explicit tests across groups for their brain-behavior analyses if they want to make any developmentally relevant interpretations of the data. Also, It would be helpful to include similar analyses to those using the univariate signals, and not just the RSA results.

    Following reviewer’s suggestion, we included brain-behavior analyses for univariate data as well as RSA data with explicit tests across groups. These can be found in the Results Section pp. 18-20, 28-32. Due to the interdependence of predefined ROIs and to avoid running a high number of correlation tests, we employed the partial least square correlation analysis for this purpose. This approach focuses on multivariate links between specified Regions of Interest (ROIs) and fluctuations in memory performance over short and long delays across different age cohorts. We argue that this multivariate strategy offers a more comprehensive understanding of the relationships between brain metrics across various ROIs and memory performance, given their mutual dependence and connectivity (refer to Genon et al. (2022) for similar discussions).

    (7) There could be dramatic differences in memory processing across 5-7 year olds. I know the sample is a little small for this, but I would like to see regressions done within the middle childhood group in addition to the across-group comparisons.

    We have included information detailing the relationship between memory retention rate and age within the child group (refer to p. 13). In the child group, both recent and short delay remote memory improved with age. However, the retention rate for long-delayed memory did not show a significant improvement with increasing age in children.

    (8) I am concerned that the authors used global-signal as a regressor in their first-level analyses, given that there could be large changes in the amount of univariate activation that occurs across groups. This approach can lead to false positives and negatives that obscure localized differences. The authors should remove this term, and perhaps use the mean sum of the white matter or CSF to achieve the noise regressor they wanted to include.

    We understand the reviewers' concerns. However, we believe that our approach is recommended for the pediatric population. Specifically, Graff et al., 2021, found that global signal regression is a highly efficacious denoising technique in their study of 4 to 8-year-old children. This technique was previously suggested for adults by Ciric et al., 2017, and the benefits in terms of motion and physiological noise removal outweigh the potential costs of removing some signal of interest, as indicated by Behzadi et al., 2007. Additionally, we incorporated the six anatomic component-based noise correction (CompCor) to account for WM and CSF signals, as recommended in the pediatric literature.

    (9) The authors discuss the relationship between hippocampal reactivation and worse memory through the lens of Schapiro et al., but a new paper by Tanriverdi et al came out in JOCN recently that is more similar to the authors' findings.

    Thank you for highlighting the recent paper by Tanriverdi et al. in JOCN, which aligns closely with our findings. We appreciate the suggestion and agree that exploring this alignment could further enrich our discussion on the relationship between hippocampal reactivation and memory retention. We incorporated this work in our revised manuscript .

    Minor Comments

    - I was surprised that the authors did not see any differences in univariate signals for memory retrieval as a function of development, as much of the prior work has shown differences (for example work by Tracy Riggins). I believe this contrast should be highlighted in the discussion.

    - Given the robust differences in sleep patterns across childhood and the role of sleep in systems consolidation framework, I think this feature should be highlighted in either the introduction or discussion.

    - Could the authors report on differences (or lack of differences) in head motion across the groups, and if they are different whether they could include them as a confounding variable.

    I believe we included six motion parameters and their derivatives into the model

    Thank you for your comments.

    First, prior works on univariate signals of memory retrieval focused mostly on remembered vs forgotten contrasts, while in our study we focused on remote vs recent in short and long delay only for correctly remembered items. This can partially explain the results. We highlighted this information in the discussion session.

    Second, we agree with the reviewer that sleep patterns across childhood should be addressed in the analysis. Therefore, we incorporated them in the discussion section.

    Third, indeed head motion were included in the analysis as confounding variables, as adding them is highly recommended for the developmental population (e.g., Graff et al. 2021). As an example, we observed higher framewise displacement in children compared to adults, t = -16(218), p <. 001, as well as in translational y, t = -2.33(288), p = .02.

    Reviewer #3 (Public Reviews):


    This study aimed to understand the neural correlates of memory recall over short (1-day) and long (14-days) intervals in children (5-7 years old) relative to young adults. The results show that children recall less than young adults and that this is accompanied by less activation (relative to young adults) in brain networks associated with memory retrieval.


    This paper is one of few investigating long-term memory (multiple days) in a developmental population, an important gap in the field. Also, the authors apply a representational similarity analysis to understand how specific memories evolve over time. This analysis shows how the specificity of memories decreases over time in children relative to adults. This is an interesting finding.

    We thank the reviewer for the appraisal of our manuscript.


    Overall, these results are consistent with what we already know: recall is worse in children relative to adults (e.g., Cycowicz et al., 2001) and children activate memory retrieval networks to a lesser extent than adults (Bauer et al, 2017).

    It seems that the reduced activation in memory recall networks is likely associated with less depth of memory encoding in children due to inattentiveness, reduced motivation, and documented differences in memory strategies. In regard to this, there was consideration of IQ, sex, and handedness but these were not included as covariates as they were not significant although I note p<.16 suggests there was some level of association nonetheless. Also, IQ is measured differently for the children and adults so it's not clear these can be directly contrasted. The authors suggest the instructed elaborative encoding strategy is effective for children and adults but the reference in support of this (Craik & Tulving, 1975) does not seem to support this point.

    Thank you for your review, and we appreciate your valuable feedback. Here are our responses and clarifications:

    Regarding the novelty of the results in terms of mentioned existent literature, we believe that in contrast to Cycowicz et al. (2001) and Bauer et al (2017), etc, we assess not only immediate memory after encoding with semantic judgement of abstract associations, but add to these findings investigating consolidation-related changes in complex associative and contextual information in much under investigated sample of 5-to-7-year-old preschoolers. With this we are able to infer also how neural representations of children change over time, providing invaluable insights into knowledge formation in this developmental cohort.

    With this, the observed age differences are not so of primary importance, as time-related changes in mnemonic representations observed in children.

    Regarding the assumption of inattentiveness in children, we want to emphasize that the experimenter was present throughout the learning process, closely supervising the children. We observed prompt responses to every trial in children and noted an increase in accuracy over the encoding-learning cycles, leading us to conclude that the children were indeed attentive to the task. The observed accuracy improvement across learning cycles indicates increase in remembered information. Furthermore, we took measures to ensure their engagement, including extensive training in both verbal and computerized versions to ensure that they understood and actively created stories to support their learning.

    We collected motivation data after each task execution in children, and the results indicated that they scored high in motivation. Children not only completed the tasks but also expressed their willingness to participate in subsequent appointments, highlighting their active involvement in the study.

    The observed differences in the efficiency of strategy utilization were expected, given developmental differences in the associative and strategic components of memory in children, as noted in prior research (Shing, 2008, 2010).

    We appreciate your point about IQ, sex, and handedness. These variables were indeed included in the behavioral models, and mean brain activation was also included in the brain data models, addressing the potential influence of these factors on our results.

    While it's true that we applied different tests to measure IQ in children and adults, these tests targeted comparable subtests that addressed similar cognitive constructs. As the final IQ values are standardized, we believe it is appropriate to compare them between the two groups.

    Lastly, we agree that the citation Craik & Tulving, 1975 supports the notion of effectiveness of instructed elaborative learning only in adults, but not in children. For this purpose, we added relevant literature for the child cohort (i.e., Pressley, 1982; Pressley et al., 1981; Shing et al., 2008).

    Reviewer #3 (Recommendations For The Authors):

    An additional point for the authors to consider is that the hypotheses were uncertain. The first is that prefrontal, parietal, cerebellar, occipital, and PHG brain regions would have greater activation over time in adults and not children - which is very imprecise as this is basically the whole brain. Moreover, brain imaging data may be in opposition to this prediction: e.g., the hippocampus has a delayed maturational pattern beyond 5-yrs (, Canada 2019; Uematsu 2012) and some cortical data predicts earlier development in these regions.

    Thank you for your feedback, and we appreciate your insights regarding our hypotheses.

    The selection of our regions of interest (ROIs) was guided by prior literature that has demonstrated the interactive involvement of multiple brain areas in memory retrieval and consolidation processes. Additionally, our recent work utilizing multivariate partial least square correlation analysis (Schommartz, 2022, Developmental Cognitive Neuroscience) has indicated that unique profiles derived from the structural integrity of multiple brain regions are differentially related to short and long-delay memory consolidation.

    Indeed, the literature suggests that the hippocampus may exhibit a more delayed maturational pattern extending into adolescence, as supported by studies such as Canada (2019) and Uematsu (2012), etc. We added this information as well as findings from the literature on cortical development to be more balanced in our review of the literature.

    Given this complexity, we believe it is important to emphasize in our discussion that both the medial temporal lobe, including the hippocampus, and cortical structures, as well as the cerebellum, undergo profound neural maturation. We highlight these nuances in our revised manuscript to provide a more comprehensive perspective on the developmental differences in memory retention over time.

    The writing was challenging to follow - consider as an example on page 9 the sentence that spans 10 lines of text.

    Thank you for bringing this to our attention. We have carefully reviewed the manuscript and have made efforts to streamline the text, ensuring that sentences are not overly long or complex to improve readability and comprehension.

    I found the analysis (and accompanying figures) a bit of a data mine - there are so many results that are hard to digest and in other cases highly redundant one from the other. This may be resolved in part by moving redundant findings to the supplemental. Some were hard to follow - so when there is a line between recent and recent data, that seems confusing to connect data that, I believe, are different sets of items. Later scatterplots (Fig 7) have pale yellow dots that I had a hard time seeing.

    Thank you for bringing up your concerns regarding the analysis and figures in our manuscript. We have carefully considered your feedback and made several improvements to address these issues.

    To alleviate the challenge of digesting numerous results, we have taken steps to enhance clarity and reduce redundancy. Specifically, we have moved some of the redundant findings to the supplementary sections, which should help streamline the main manuscript and make it more reader friendly.

    Regarding the line between 'recent' and 'recent data,' figure were transformed to a clearer version. Furthermore, we have improved the visibility of certain elements, such as the pale-yellow dots in the scatterplots (Fig 1, 2, 4, etc. ), to ensure that readers can better discern the data points.

  6. Reviewer #1 (Public Review):

    This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, parahippocampal gyrus, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though I have substantive concerns about how this analysis was performed and as such will not summarize the results. Broadly, the behavioural and univariate findings are consistent with the idea that memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.

    The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.

    As noted above, the pattern similarity analysis for both item and category-level reinstatement was performed in a way that is not interpretable given concerns about temporal autocorrelation within the scanning run. Below, I focus my review on this analytic issue, though I also outline additional concerns.

    1. The pattern similarity analyses were not done correctly, rendering the results uninterpretable (assuming my understanding of the authors' approach is correct).

    a. First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within the scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, but I don't believe this is possible unfortunately given the authors' design; I believe the target (presumably reinstated) scene only appears once during scanning, so there is no separate neural pattern during the presentation of this picture that they can use. For these reasons, any evidence for "significant scene-specific reinstatement" and the like is completely uninterpretable and would need to be removed from the paper.

    b. From a theoretical standpoint, I believe the way this analysis was performed considering the fixation and the immediately following scene also means that the differences between recent and remote could have to do with either the reactivation (processes happening during the fixation, presumably) or differences in the processing of the stimulus itself (happening during the scene presentation). For example, people might be more engaged with the more novel scenes (recent) and therefore process those scenes more; such a difference would be interpreted in this analysis as having to do with reinstatement, but in fact could be just related to the differential scene processing/recognition, etc. It would be important when comparing scene-specific neural patterns as templates for reinstatement across conditions that, at the time of scene presentation itself, the two conditions are equal (e.g., no difference in familiarity and so on); otherwise, we do not know which trial period (and therefore which underlying process) is driving the differences.

    c. For the category-based neural reinstatement: (1) This suffers from the same issue of correlations being performed within the run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). With this restriction, it may or may not be possible to perform this analysis, depending upon how the same-category scenes are distributed across runs. However, there are other issues with this analysis, as well. (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. The authors do not motivate the reason for this switch. Please provide reasoning as to why fixation-fixation is more appropriate than fixation-scene similarity for category-level reinstatement, particularly given the opposite was used for item-level reinstatement. Even if the analyses were done properly, it would remain hard to compare them given this difference in approach. (3) I believe the fixation cross with itself is included in the "within category" score. Is this not a single neural pattern correlated with itself, which will yield maximal similarity (pearson r=1) or minimal dissimilarity (1-pearson r=0)? Including these comparisons in the averages for the within-category score will inflate the difference between the "within-category" and "between-category" comparisons. These (e.g., forest1-forest1) should not be included in the within-category comparisons considered; rather, they should be excluded, so the fixations are always different but sometimes the comparisons are two retrievals of the same scene type (forest1-forest2), and other times different scene types (forest1-field1). (4) It is troubling that the results from the category reinstatement metric do not seem to conceptually align with past work; for example, a lot of work has shown category-level reinstatement in adults. Here the authors do not show any category-level reinstatement in adults (yet they do in children), which generally seems extremely unexpected given past work and I would guess has to do with the operationalization of the metric.

    2. I did not see any compelling statistical evidence for the claim of less robust consolidation in children. Specifically in terms of the behavioural results of retention of the remote items at 1 vs 14 days, shown in Figure 2B, the authors conclude that memory consolidation is less robust in children (line 246). Yet they do not report statistical evidence for this point, as there was no interaction of this effect with the age group. Children had worse memory than adults overall (in terms of a main effect - i.e. across recent and remote items). If it were consolidation-specific, one would expect that the age differences are bigger for the remote items, and perhaps even most exaggerated for the 14-day-old memories. Yet this does not appear to be the case based on the data the authors report. Therefore, the behavioural differences in retention do not seem to be consolidation specific, and therefore might have more to do with differences in encoding fidelity or retrieval processes more generally across the groups. This should be taken into account when interpreting the findings.

    3. Please clarify which analyses were restricted to correct retrievals only. The univariate analyses states that correct and incorrect trials were modelled separately, but does not say which were considered in the main contrast (I assume correct only?). The item specific reinstatement analysis states that only correct trials were considered, but the category-level reinstatement analysis does not say. Please include this detail.

    4. To what extent could performance differences be impacting the differences observed across age groups? I think (see prior comment) that the analyses were probably limited to correct trials, which is helpful, but still yields pretty big differences across groups in terms of the amount of data going into each analysis. In general, children showed more attenuated neural effects (e.g., recent/remote or session effects); could this be explained by their weaker memory? Specifically, if only correct trials are considered that means that fewer trials would be going into the analysis for kids, especially for the 14-day remote memories, and perhaps pushing the remove > recent difference for this condition towards 0. The authors might be able to address this analytically; for example, does the remote > recent difference in the univariate data at day 14 correlate with day 14 memory?

    5. Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. report difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). This difference from zero or lack thereof seems important to the message - is that correct? If so, can the authors incorporate descriptions of these findings?

    6. Please provide more details about the choices available for locations in the 3AFC task. (1) Were they different each time, or always the same? If they are always the same, could this be a motor or stimulus/response learning task? (2) Do the options in the 3AFC always come from the same area - in which case the participant is given a clue as to the gist of the location/memory? Or are they sometimes randomly scattered across the image (in which case gist memory, like at a delay, would be sufficient for picking the right option)? Please clarify these points and discuss the logic/impact of these choices on the interpretation of the results.

    7. Often p values are provided but test statistics, effect sizes, etc. are not - please include this information. It is at times hard to tell whether the authors are reporting main effects, interactions, pairwise comparisons, etc.

    8. There are not enough methodological details in the main paper to make sense of the results. For example, it is not clear from reading the text that there are new object-location pairs learned each day.

    9. The retrieval task does not seem to require retrieval of the scene itself, and as such it would be helpful for the authors to both explain their reasoning for this task to measure reinstatement. Strictly speaking, participants could just remember the location of the object on the screen. Was it verified that children and adults were recalling the actual scene rather than just the location (e.g. via self-report)? It's possible that there may be developmental differences in the tendency to reinstate the scene depending on e.g., their strategy.

    10. In general I found the Introduction a bit difficult to follow. Below are a few specific questions I had.

    a. At points findings are presented but the broader picture or take-home point is not expressed directly. For example, lines 112-127, these findings can all be conceptualized within many theories of consolidation, and yet those overarching frameworks are not directly discussed (e.g., that memory traces go from being more reliant on the hippocampus to more on the neocortex). Making these connections directly would likely be helpful for many readers.

    b. Lines 143-153 - The comparison of the Tompary & Davachi (2017) paper with the Oedekoven et al. (2017) reads like the two analyses are directly comparable, but the authors were looking at different things. The Tompary paper is looking at organization (not reinstatement); while the Oedekoven et al. paper is measuring reinstatement (not organization). The authors should clarify how to reconcile these findings.

    c. Line 195-6: I was confused by the prediction of "stable involvement of HC over time" given the work reviewed in the Introduction that HC contribution to memory tends to decrease with consolidation. Please clarify or rephrase.

    d. Lines 200-202: I was a bit confused about this prediction. Firstly, please clarify whether immediate reinstatement has been characterized in this way for kids versus adults. Secondly, don't adults retain gist more over long delays (with specific information getting lost), at least behaviourally? This prediction seems to go against that; please clarify.

  7. Reviewer #2 (Public Review):

    Schommartz et al. present a manuscript characterizing neural signatures of reinstatement during cued retrieval of middle-aged children compared to adults. The authors utilize a paradigm where participants learn the spatial location of semantically related item-scene memoranda which they retrieve after short or long delays. The paradigm is especially strong as the authors include novel memoranda at each delayed time point to make comparisons across new and old learning. In brief, the authors find that children show more forgetting than adults, and adults show greater engagement of cortical networks after longer delays as well as stronger item-specific reinstatement. Interestingly, children show more category-based reinstatement, however, evidence supports that this marker may be maladaptive for retrieving episodic details. The question is extremely timely both given the boom in neurocognitive research on the neural development of memory, and the dearth of research on consolidation in this age group. Also, the results provide novel insights into why consolidation processes may be disrupted in children. Despite these strengths, there are quite a few important design and analytical choices that derail my enthusiasm for the paper. If the authors could address these concerns, this manuscript would provide a solid foundation to better understand memory consolidation in children.

  8. Reviewer #3 (Public Review):

    This study aimed to understand the neural correlates of memory recall over short (1-day) and long (14-days) intervals in children (5-7 years old) relative to young adults. The results show that children recall less than young adults and that this is accompanied by less activation (relative to young adults) in brain networks associated with memory retrieval.

    This paper is one of few investigating long-term memory (multiple days) in a developmental population, an important gap in the field. Also, the authors apply a representational similarity analysis to understand how specific memories evolve over time. This analysis shows how the specificity of memories decreases over time in children relative to adults. This is an interesting finding.

    Overall, these results are consistent with what we already know: recall is worse in children relative to adults (e.g., Cycowicz et al., 2001) and children activate memory retrieval networks to a lesser extent than adults (Bauer et al, 2017).

    It seems that the reduced activation in memory recall networks is likely associated with less depth of memory encoding in children due to inattentiveness, reduced motivation, and documented differences in memory strategies. In regards to this, there was consideration of IQ, sex, and handedness but these were not included as covariates as they were not significant although I note p<.16 suggests there was some level of association nonetheless. Also, IQ is measured differently for the children and adults so it's not clear these can be directly contrasted. The authors suggest the instructed elaborative encoding strategy is effective for children and adults but the reference in support of this (Craik & Tulving, 1975) does not seem to support this point.