Targeted memory reactivation in human REM sleep elicits detectable reactivation

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This valuable work in human subjects reports that sounds that were associated with specific memories during waking behaviors can trigger the reactivation of these memory representations during REM sleep. However, the evidence supporting the conclusions is currently incomplete. Still, the work has the potential to expand our understanding of memory processing during sleep.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

It is now well established that memories can reactivate during non-rapid eye movement (non-REM) sleep, but the question of whether equivalent reactivation can be detected in rapid eye movement (REM) sleep is hotly debated. To examine this, we used a technique called targeted memory reactivation (TMR) in which sounds are paired with learned material in wake, and then re-presented in subsequent sleep, in this case REM, to trigger reactivation. We then used machine learning classifiers to identify reactivation of task-related motor imagery from wake in REM sleep. Interestingly, the strength of measured reactivation positively predicted overnight performance improvement. These findings provide the first evidence for memory reactivation in human REM sleep after TMR that is directly related to brain activity during wakeful task performance.

Article activity feed

  1. Author Response

    Reviewer #2 (Public Review):

    I believe the authors succeeded in finding neural evidence of reactivation during REM sleep. This is their main claim, and I applaud them for that. I also applaud their efforts to explore their data beyond this claim, and I think they included appropriate controls in their experimental design. However, I found other aspects of the paper to be unclear or lacking in support. I include major and medium-level comments:

    Major comments, grouped by theme with specifics below:

    Theta.

    Overall assessment: the theta effects are either over-emphasized or unclear. Please either remove the high/low theta effects or provide a better justification for why they are insightful.

    Lines ~ 115-121: Please include the statistics for low-theta power trials. Also, without a significant difference between high- and low-theta power trials, it is unclear why this analysis is being featured. Does theta actually matter for classification accuracy?

    Lines 123-128: What ARE the important bands for classification? I understand the point about it overlapping in time with the classification window without being discriminative between the conditions, but it still is not clear why theta is being featured given the non-significant differences between high/low theta and the lack of its involvement in classification. REM sleep is high in theta, but other than that, I do not understand the focus given this lack of empirical support for its relevance.

    Line 232-233: "8). In our data, trials with higher theta power show greater evidence of memory reactivation." Please do not use this language without a difference between high and low theta trials. You can say there was significance using high theta power and not with low theta power, but without the contrast, you cannot say this.

    Thank you, we have taken this point onboard. We thought the differences observed between classification in high and low theta power trials were interesting, but we can see why the reviewer feels there is a need for a stronger hypothesis here before reporting them. We have therefore removed this approach from the manuscript, and no longer split trials into high and low theta power.

    Physiology / Figure 2.

    Overall assessment: It would be helpful to include more physiological data.

    It would be nice, either in Figure 2 or in the supplement, to see the raw EEG traces in these conditions. These would be especially instructive because, with NREM TMR, the ERPs seem to take a stereotypical pattern that begins with a clear influence of slow oscillations (e.g., in Cairney et al., 2018), and it would be helpful to show the contrast here in REM.

    We thank the reviewer for these comments. We have now performed ERP and time-frequency analyses following a similar approach to that of (Cairney et al., 2018). We have added a section in the results for these analyses as follows:

    “Elicited response pattern after TMR cues

    We looked at the TMR-elicited response in both time-frequency and ERP analyses using a method similar to the one used in (Cairney et al., 2018), see methods. As shown in Figure 2a, the EEG response showed a rapid increase in theta band followed by an increase in beta band starting about one second after TMR onset. REM sleep is dominated by theta activity, which is thought to support the consolidation process (Diekelmann & Born, 2010), and increased theta power has previously been shown to occur after successful cueing during sleep (Schreiner & Rasch, 2015). We therefore analysed the TMR-elicited theta in more detail. Focussing on the first second post-TMR-onset, we found that theta was significantly higher here than in the baseline period, prior to the cue [-300 -100] ms, for both adaptation (Wilcoxon signed rank test, n = 14, p < 0.001) and experimental nights (Wilcoxon signed rank test, n = 14, p < 0.001). The absence of any difference in theta power between experimental and adaptation conditions (Wilcoxon signed rank test, n = 14, p = 0.68), suggests that this response is related to processing of the sound cue itself, not to memory reactivation. Turning to the ERP analysis, we found a small increase in ERP amplitude immediately after TMR onset, followed by a decrease in amplitude 500ms after the cue. Comparison of ERPs from experimental and adaptation nights showed no significant difference, (n= 14, p > 0.1). Similar to the time-frequency result, this suggests that the ERPs observed here relate to the processing of the sound cues rather than any associated memory.“

    And we have updated Figure 2.

    Also, please expand the classification window beyond 1 s for wake and 1.4 s for sleep. It seems the wake axis stops at 1 s and it would be instructive to know how long that lasts beyond 1 s. The sleep signal should also go longer. I suggest plotting it for at least 5 seconds, considering prior investigations (Cairney et al., 2018; Schreiner et al., 2018; Wang et al., 2019) found evidence of reactivation lasting beyond 1.4 s.

    Regarding the classification window, this is an interesting point. TMR cues in sleep were spaced 1.5 s apart and that is why we included only this window in our classification. Extending our window beyond 1.5 s would mean that we considered the time when the next TMR cue was presented. Similarly, in wake the duration of trials was 1.1 s thus at 1.1 s the next tone was presented.

    Following the reviewer’s comment, we have extended our window as requested even though this means encroaching on the next trial. We do this because it could be possible that there is a transitional period between trials. Thus, when we extended the timing in wake and looked at reactivation in the range 0.5 s to 1.6 s we found that the effect continued to ~1.2 s vs adaptation and chance, e.g. it continued 100 ms after the trial. Results are shown in the figures below.

    Temporal compression/dilation.

    Overall assessment: This could be cut from the paper. If the authors disagree, I am curious how they think it adds novel insight.

    Line 179 section: In my opinion, this does not show evidence for compression or dilation. If anything, it argues that reactivation unfolds on a similar scale, as the numbers are clustered around 1. I suggest the authors scrap this analysis, as I do not believe it supports any main point of their paper. If they do decide to keep it, they should expand the window of dilation beyond 1.4 in Figure 3B (why cut off the graph at a data point that is still significant?). And they should later emphasize that the main conclusion, if any, is that the scales are similar.

    Line 207 section on the temporal structure of reactivation, 1st paragraph: Once again, in my opinion, this whole concept is not worth mentioning here, as there is not really any relevant data in the paper that speaks to this concept.

    We thank the reviewer for these frank comments. On consideration, we have now removed the compression/dilation analysis.

    Behavioral effects.

    Overall assessment: Please provide additional analyses and discussion.

    Lines 171-178: Nice correlation! Was there any correlation between reactivation evidence and pre-sleep performance? If so, could the authors show those data, and also test whether this relationship holds while covarying our pre-sleep performance? The logic is that intact reactivation may rely on intact pre-sleep performance; conversely, there could be an inverse relationship if sleep reactivation is greater for initially weaker traces, as some have argued (e.g., Schapiro et al., 2018). This analysis will either strengthen their conclusion or change it -- either outcome is good.

    Thanks for these interesting points. We have now performed a new analysis to check if there was a correlation between classification performance and pre-sleep performance, but we found no significant correlation (n = 14, r = -0.39, p = 0.17). We have included this in the results section as follows:

    “Finally, we wanted to know whether the extent to which participants learned the sequence during training might predict the extent to which we could identify reactivation during subsequent sleep. We therefore checked for a correlation between classification performance and pre-sleep performance to determine whether the degree of pre-sleep learning predicted the extent of reactivation, this showed no significant correlation (n = 14, r = -0.39, p = 0.17). “

    Note that we calculated the behavioural improvement while subtracting pre-sleep performance and then normalising by it for both the cued and un-cued sequences as follows:

    [(random blocks after sleep - the best 4 blocks after sleep) – (random blocks pre-sleep – the best 4 blocks pre-sleep)] / (random blocks pre-sleep – the best 4 blocks pre-sleep).

    Unlike Schönauer et al. (2017), they found a strong correspondence between REM reactivation and memory improvement across sleep; however, there was no benefit of TMR cues overall. These two results in tandem are puzzling. Could the authors discuss this more? What does it mean to have the correlation without the overall effect? Or else, is there anything else that may drive the individual differences they allude to in the Discussion?

    We have now added a discussion of this point as follows:

    “We are at a very early phase in understanding what TMR does in REM sleep, however we do know that the connection between hippocampus and neocortex is inhibited by the high levels of Acetylcholine that are present in REM (Hasselmo, 1999). This means that the reactivation which we observe in the cortex is unlikely to be linked to corresponding hippocampal reactivation, so any consolidation which occurs as a result of this is also unlikely to be linked to the hippocampus. The SRTT is a sequencing task which relies heavily on the hippocampus, and our primary behavioural measure (Sequence Specific Skill) specifically examines the sequencing element of the task. Our own neuroimaging work has shown that TMR in non-REM sleep leads to extensive plasticity in the medial temporal lobe (Cousins et al., 2016). However, if TMR in REM sleep has no impact on the hippocampus then it is quite possible that it elicits cortical reactivation and leads to cortical plasticity but provides no measurable benefit to Sequence Specific Skill. Alternatively, because we only measured behavioural improvement right after sleep it is possible that we may have missed behavioural improvements that would have emerged several days later, as we know can occur in this task (Rakowska et al., 2021).”

    Medium-level comments

    Lines 63-65: "We used two sequences and replayed only one of them in sleep. For control, we also included an adaptation night in which participants slept in the lab, and the same tones that would later be played during the experimental night were played."

    I believe the authors could make a stronger point here: their design allowed them to show that they are not simply decoding SOUNDS but actual memories. The null finding on the adaptation night is definitely helpful in ruling this possibility out.

    We agree and would like to thank the reviewer for this point. We have now included this in the text as follows: “This provided an important control, as a null finding from this adaptation night would ensure that we are decoding actual memories, not just sounds. “

    Lines 129-141: Does reactivation evidence go down (like in their prior study, Belal et al., 2018)? All they report is theta activity rather than classification evidence. Also, I am unclear why the Wilcoxon comparison was performed rather than a simple correlation in theta activity across TMR cues (though again, it makes more sense to me to investigate reactivation evidence across TMR cues instead).

    Thanks a lot for the interesting point. In our prior study (Belal et. al. 2018), the classification model was trained on wake data and then tested on sleep data, which enabled us to examine its performance at different timepoints in sleep. However in the current study the classifier was trained on sleep and tested on wake, so we can only test for differential replay at different times during the night by dividing the training data. We fear that dividing sleep trials into smaller blocks in this way will lead to weakly trained classifiers with inaccurate weight estimation due to the few training trials, and that these will not be generalisable to testing data. Nevertheless, following your comment, we tried this, by dividing our sleep trials into two blocks, e.g. the first half of stimulation during the night and the second half of stimulation during the night. When we ran the analysis on these blocks separately, no clusters were found for either the first or second halves of stimulation compared to adaptation, probably due to the reasons cited above. Hence the differences in design between the two studies mean that the current study does not lend itself to this analysis.

    Line 201: It seems unclear whether they should call this "wake-like activity" when the classifier involved training on sleep first and then showing it could decode wake rather than vice versa. I agree with the author's logic that wake signals that are specific to wake will be unhelpful during sleep, but I am not sure "wake-like" fits here. I'm not going to belabor this point, but I do encourage the authors to think deeply about whether this is truly the term that fits.

    We agree that a better terminology is needed, and have now changed this: “In this paper we demonstrated that memory reactivation after TMR cues in human REM sleep can be decoded using EEG classifiers. Such reactivation appears to be most prominent about one second after the sound cue onset. ”

    Reviewer #3 (Public Review):

    The authors investigated whether reactivation of wake EEG patterns associated with left- and right-hand motor responses occurs in response to sound cues presented during REM sleep.

    The question of whether reactivation occurs during REM is of substantial practical and theoretical importance. While some rodent studies have found reactivation during REM, it has generally been more difficult to observe reactivation during REM than during NREM sleep in humans (with a few notable exceptions, e.g., Schonauer et al., 2017), and the nature and function of memory reactivation in REM sleep is much less well understood than the nature and function of reactivation in NREM sleep. Finding a procedure that yields clear reactivation in REM in response to sound cues would give researchers a new tool to explore these crucial questions.

    The main strength of the paper is that the core reactivation finding appears to be sound. This is an important contribution to the literature, for the reasons noted above.

    The main weakness of the paper is that the ancillary claims (about the nature of reactivation) may not be supported by the data.

    The claim that reactivation was mediated by high theta activity requires a significant difference in reactivation between trials with high theta power and trials with low theta, but this is not what the authors found (rather, they have a "difference of significances", where results were significant for high theta but not low theta). So, at present, the claim that theta activity is relevant is not adequately supported by the data.

    The authors claim that sleep replay was sometimes temporally compressed and sometimes dilated compared to wakeful experience, but I am not sure that the data show compression and dilation. Part of the issue is that the methods are not clear. For the compression/dilation analysis, what are the features that are going into the analysis? Are the feature vectors patterns of power coefficients across electrodes (or within single electrodes?) at a single time point? or raw data from multiple electrodes at a single time point? If the feature vectors are patterns of activity at a single time point, then I don't think it's possible to conclude anything about compression/dilation in time (in this case, the observed results could simply reflect autocorrelation in the time-point-specific feature vectors - if you have a pattern that is relatively stationary in time, then compressing or dilating it in the time dimension won't change it much). If the feature vectors are spatiotemporal patterns (i.e., the patterns being fed into the classifier reflect samples from multiple frequencies/electrodes / AND time points) then it might in principle be possible to look at compression, but here I just could not figure out what is going on.

    Thank you. We have removed the analysis of temporal compression and dilation from the manuscript. However, we wanted to answer anyway. In this analysis, raw data were smoothed and used as time domain features. The data was then organized as trials x channels x timepoints then we segmented each trial in time based on the compression factor we are using. For instance, if we test if sleep is 2x faster than wake we look at the trial lengths in wake which was 1.1 sec. and we take half of this value which is 0.55 sec. we then take a different window in time from sleep data such that each sleep trial will have multiple smaller segments each of 0.55 sec., we then add those segments as new trials and label them with the respective trial label. Afterwards, we resize those segments temporally to match the length of wake trials. We now reshape our data from trials x channels x timepoints to trials x channels_timepoints so we aggregate channels and timepoints into one dimension. We then feed this to PCA to reduce the dimensionality of channels_timepoints into principal components. We then feed the resultant features to a LDA classifier for classification. This whole process is repeated for every scaling factor and it is done within participant in the same fashion the main classification was done and the error bars were the standard errors. We compared the results from the experimental night to those of the adaptation night.

    For the analyses relating to classification performance and behavior, the authors presently show that there is a significant correlation for the cued sequence but not for the other sequence. This is a "difference of significances" but not a significant difference. To justify the claim that the correlation is sequence-specific, the authors would have to run an analysis that directly compares the two sequences.

    Thanks a lot. We have now followed this suggestion by examining the sequence specific improvement after removing the effect of the un-cued sequence from the cued sequence. This was done by subtracting the improvement of the un-cued sequence from the improvement for the cued sequence, and then normalising the result by the improvement of the un-cued sequence. The resulting values, which we term ‘cued sequence improvement’ showed a significant correlation with classification performance (n = 14, r = 0.56, p = 0.04). We have therefore amended this section of the manuscript as follows: We have updated the text as follows: “We therefore set out to determine whether there was a relationship between the extent to which we could classify reactivation and overnight improvement on the cued sequence. This revealed a positive correlation (n = 14, r = 0.56, p = 0.04), Figure 3b.”

  2. eLife assessment

    This valuable work in human subjects reports that sounds that were associated with specific memories during waking behaviors can trigger the reactivation of these memory representations during REM sleep. However, the evidence supporting the conclusions is currently incomplete. Still, the work has the potential to expand our understanding of memory processing during sleep.

  3. Reviewer #1 (Public Review):

    Abdellahi et al. used targeted memory reactivation (TMR) and machine learning tools to look for evidence that waking neural activity is reinstated during subsequent REM sleep. Prior work has demonstrated that learning content is successfully decoded following TMR cues during NREM sleep, but a direct link between patterns of brain activity recorded during wakefulness and subsequent REM sleep in humans has never been reported. In this paper, the authors report that an LDA classifier detects wake-like neural activity (specifically, neural activity recorded while imaging performing a trained serial reaction time task) approximately one second after TMR cues are presented during REM sleep. Decoding performance is better when the classifier is trained on sleep trials with high theta compared to low theta power, and classifier performance was correlated with overnight improvement on the task.

    Finding evidence of reinstated waking neural activity during REM sleep is an exciting result, and the authors present a promising method that holds implications for advancing our understanding of how memories are reprocessed during REM sleep. I think it is a particular strength of the paper that the authors trained on sleep data and tested in wake data, which is analogous to prior rodent studies that found evidence of replay during REM. I also thought playing sounds during the adaptation night, prior to SRTT training, provided a nice control.

    The conclusions of this paper are mostly supported by the results presented, but it is not always clear how those results were obtained. Some aspects of the experimental and data analytic methods need to be clarified and expanded, both for a better understanding of how the results of this study were obtained, as well as for future reproducibility.

  4. Reviewer #2 (Public Review):

    I believe the authors succeeded in finding neural evidence of reactivation during REM sleep. This is their main claim, and I applaud them for that. I also applaud their efforts to explore their data beyond this claim, and I think they included appropriate controls in their experimental design. However, I found other aspects of the paper to be unclear or lacking in support. I include major and medium-level comments:

    Major comments, grouped by theme with specifics below:
    Theta.
    Overall assessment: the theta effects are either over-emphasized or unclear. Please either remove the high/low theta effects or provide a better justification for why they are insightful.

    Lines ~ 115-121: Please include the statistics for low-theta power trials. Also, without a significant difference between high- and low-theta power trials, it is unclear why this analysis is being featured. Does theta actually matter for classification accuracy?

    Lines 123-128: What ARE the important bands for classification? I understand the point about it overlapping in time with the classification window without being discriminative between the conditions, but it still is not clear why theta is being featured given the non-significant differences between high/low theta and the lack of its involvement in classification. REM sleep is high in theta, but other than that, I do not understand the focus given this lack of empirical support for its relevance.

    Line 232-233: "8). In our data, trials with higher theta power show greater evidence of memory reactivation." Please do not use this language without a difference between high and low theta trials. You can say there was significance using high theta power and not with low theta power, but without the contrast, you cannot say this.

    Physiology / Figure 2.
    Overall assessment: It would be helpful to include more physiological data.

    It would be nice, either in Figure 2 or in the supplement, to see the raw EEG traces in these conditions. These would be especially instructive because, with NREM TMR, the ERPs seem to take a stereotypical pattern that begins with a clear influence of slow oscillations (e.g., in Cairney et al., 2018), and it would be helpful to show the contrast here in REM. Also, please expand the classification window beyond 1 s for wake and 1.4 s for sleep. It seems the wake axis stops at 1 s and it would be instructive to know how long that lasts beyond 1 s. The sleep signal should also go longer. I suggest plotting it for at least 5 seconds, considering prior investigations (Cairney et al., 2018; Schreiner et al., 2018; Wang et al., 2019) found evidence of reactivation lasting beyond 1.4 s.

    Temporal compression/dilation.
    Overall assessment: This could be cut from the paper. If the authors disagree, I am curious how they think it adds novel insight.

    Line 179 section: In my opinion, this does not show evidence for compression or dilation. If anything, it argues that reactivation unfolds on a similar scale, as the numbers are clustered around 1. I suggest the authors scrap this analysis, as I do not believe it supports any main point of their paper. If they do decide to keep it, they should expand the window of dilation beyond 1.4 in Figure 3B (why cut off the graph at a data point that is still significant?). And they should later emphasize that the main conclusion, if any, is that the scales are similar.

    Line 207 section on the temporal structure of reactivation, 1st paragraph: Once again, in my opinion, this whole concept is not worth mentioning here, as there is not really any relevant data in the paper that speaks to this concept.

    Behavioral effects.
    Overall assessment: Please provide additional analyses and discussion.

    Lines 171-178: Nice correlation! Was there any correlation between reactivation evidence and pre-sleep performance? If so, could the authors show those data, and also test whether this relationship holds while covarying our pre-sleep performance? The logic is that intact reactivation may rely on intact pre-sleep performance; conversely, there could be an inverse relationship if sleep reactivation is greater for initially weaker traces, as some have argued (e.g., Schapiro et al., 2018). This analysis will either strengthen their conclusion or change it -- either outcome is good.

    Unlike Schönauer et al. (2017), they found a strong correspondence between REM reactivation and memory improvement across sleep; however, there was no benefit of TMR cues overall. These two results in tandem are puzzling. Could the authors discuss this more? What does it mean to have the correlation without the overall effect? Or else, is there anything else that may drive the individual differences they allude to in the Discussion?

    Medium-level comments
    Lines 63-65: "We used two sequences and replayed only one of them in sleep. For control, we also included an adaptation night in which participants slept in the lab, and the same tones that would later be played during the experimental night were played."

    I believe the authors could make a stronger point here: their design allowed them to show that they are not simply decoding SOUNDS but actual memories. The null finding on the adaptation night is definitely helpful in ruling this possibility out.

    Lines 129-141: Does reactivation evidence go down (like in their prior study, Belal et al., 2018)? All they report is theta activity rather than classification evidence. Also, I am unclear why the Wilcoxon comparison was performed rather than a simple correlation in theta activity across TMR cues (though again, it makes more sense to me to investigate reactivation evidence across TMR cues instead).

    Line 201: It seems unclear whether they should call this "wake-like activity" when the classifier involved training on sleep first and then showing it could decode wake rather than vice versa. I agree with the author's logic that wake signals that are specific to wake will be unhelpful during sleep, but I am not sure "wake-like" fits here. I'm not going to belabor this point, but I do encourage the authors to think deeply about whether this is truly the term that fits.

  5. Reviewer #3 (Public Review):

    The authors investigated whether reactivation of wake EEG patterns associated with left- and right-hand motor responses occurs in response to sound cues presented during REM sleep.

    The question of whether reactivation occurs during REM is of substantial practical and theoretical importance. While some rodent studies have found reactivation during REM, it has generally been more difficult to observe reactivation during REM than during NREM sleep in humans (with a few notable exceptions, e.g., Schonauer et al., 2017), and the nature and function of memory reactivation in REM sleep is much less well understood than the nature and function of reactivation in NREM sleep. Finding a procedure that yields clear reactivation in REM in response to sound cues would give researchers a new tool to explore these crucial questions.

    The main strength of the paper is that the core reactivation finding appears to be sound. This is an important contribution to the literature, for the reasons noted above.

    The main weakness of the paper is that the ancillary claims (about the nature of reactivation) may not be supported by the data.

    The claim that reactivation was mediated by high theta activity requires a significant difference in reactivation between trials with high theta power and trials with low theta, but this is not what the authors found (rather, they have a "difference of significances", where results were significant for high theta but not low theta). So, at present, the claim that theta activity is relevant is not adequately supported by the data.

    The authors claim that sleep replay was sometimes temporally compressed and sometimes dilated compared to wakeful experience, but I am not sure that the data show compression and dilation. Part of the issue is that the methods are not clear. For the compression/dilation analysis, what are the features that are going into the analysis? Are the feature vectors patterns of power coefficients across electrodes (or within single electrodes?) at a single time point? or raw data from multiple electrodes at a single time point? If the feature vectors are patterns of activity at a single time point, then I don't think it's possible to conclude anything about compression/dilation in time (in this case, the observed results could simply reflect autocorrelation in the time-point-specific feature vectors - if you have a pattern that is relatively stationary in time, then compressing or dilating it in the time dimension won't change it much). If the feature vectors are spatiotemporal patterns (i.e., the patterns being fed into the classifier reflect samples from multiple frequencies/electrodes / AND time points) then it might in principle be possible to look at compression, but here I just could not figure out what is going on.

    For the analyses relating to classification performance and behavior, the authors presently show that there is a significant correlation for the cued sequence but not for the other sequence. This is a "difference of significances" but not a significant difference. To justify the claim that the correlation is sequence-specific, the authors would have to run an analysis that directly compares the two sequences.