Uncertainty alters the balance between incremental learning and episodic memory

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This paper posits that higher uncertainty environments should lead to more reliance on episodic memory, finding compelling evidence for this idea across several analysis approaches and across two independent samples. This is an important paper that will be of interest to a broad group of learning, memory, and decision-making researchers.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

A key question in decision-making is how humans arbitrate between competing learning and memory systems to maximize reward. We address this question by probing the balance between the effects, on choice, of incremental trial-and-error learning versus episodic memories of individual events. Although a rich literature has studied incremental learning in isolation, the role of episodic memory in decision-making has only recently drawn focus, and little research disentangles their separate contributions. We hypothesized that the brain arbitrates rationally between these two systems, relying on each in circumstances to which it is most suited, as indicated by uncertainty. We tested this hypothesis by directly contrasting contributions of episodic and incremental influence to decisions, while manipulating the relative uncertainty of incremental learning using a well-established manipulation of reward volatility. Across two large, independent samples of young adults, participants traded these influences off rationally, depending more on episodic information when incremental summaries were more uncertain. These results support the proposal that the brain optimizes the balance between different forms of learning and memory according to their relative uncertainties and elucidate the circumstances under which episodic memory informs decisions.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    This paper tests whether people vary their reliance on episodic memory vs. incremental learning as a function of the uncertainty of the environment. The authors posit that higher uncertainty environments should lead to more reliance on episodic memory, and they find evidence for this effect across several kinds of analyses and across two independent samples.

    The paper is beautifully written and motivated, and the results and figures are clear and compelling. The replication in an independent sample is especially useful. I think this will be an important paper of interest to a broad group of learning, memory, and decisionmaking researchers. I have only two points of concern about the interpretation of the results:

    1. My main concern regards the indirect indicator of participants' use of episodic memory on a given trial. The authors assume that episodic memory is used if the value of the chosen object (as determined by its value the last time it was presented) does not match the current value of the deck it is presented in. They find that these mismatch choices happen more often in the high-volatility environment. But if participants simply choose in a more noisy/exploratory way in the high volatility environment, I believe that would also result in more mismatched judgments. What proportion of the trials labeled as episodic should we expect to be a result of noise or exploration? It seems conceivable that a judgment to explore could take longer, and result in the observed RT effects. Perhaps it could be useful to match up putative episodic trials with later recognition memory for those particular items. The across-subjects correlations are an indirect version of this, but could potentially be subject to a related concern if participants who explore more (and are then judged as more episodic) also simply have a better memory.

    Thank you for this important suggestion. We agree that noisy/exploratory choices could potentially masquerade as episodic on the episodic-based choice index used as one of our behavioral measures. As pointed out, this is because participants may be more likely to make noisier incremental value-based decisions in the high volatility compared to the low volatility environment. In our revision, we provided a new analysis that shows that, as the reviewer predicted, choices are indeed more noisy in the high volatility environment. We answer this concern in two ways. First, we took this noise into account in our analysis of the episodic/incremental tradeoff and show that it does not account for the main findings. And second, we provided a new analysis of subsequent memory that shows that choices that are defined as episodic during the decision-task are also associated with better recognition memory later on. These new analyses are described below as well.

    We used a mixed-effects logistic regression model to test for an interaction effect of environment and model-estimated deck value on whether the orange deck was chosen. We fit this model only to trials without the presence of a previously seen object in order to achieve a more accurate measure of noise specific to incremental learning. In both the main and replication samples, participants did indeed make noisier incremental decisions in the high compared to the low volatility environment (Main: 𝛽 = −1.589, 95% 𝐶𝐼 = [−2.091, −1.096], Replication: 𝛽 = −1.255, 95% 𝐶𝐼 = [−1.824, −0.675]). To account for the possibility that the measured difference between environments in our episodic-based choice index may be related to this difference in incremental noise between the environments, we included each participant’s random effect of the environment by deck value interaction from this model as a covariate in our analysis of the effect of environment on the episodic-based choice index. While each participants’ propensity to choose with greater noise in the high volatility environment did have an effect on the episodic-based choice index (Main: 𝛽 = 0.042, 95% 𝐶𝐼 = [0.012, 0.072], Replication: 𝛽 = 0.055, 95% 𝐶𝐼 = [0.027, 0.082]), the effect of environment was similar to that originally reported in the manuscript for both samples following this adjustment. The reported effects (lines 178 and Appendix 1) and methods (lines 643-655) have been updated to reflect these changes.

    We applied a similar logic to the reaction time analysis, to address the possibility that decisions based on exploration may take longer compared to decisions based on exploitation of learned deck value. We included a covariate in the analysis of the effect of episodic-based choices on reaction time that captured possible slowing due to switching from choosing one deck to the other (lines 656-662) and found that the slower reaction times on episodic choices are not fully explained by exploration. Because in this task a decision to explore is captured by switching from one deck to another, the effect of episodic-based choices on reaction time reported in the manuscript should account for this behavior. We have clarified this reasoning in the methods (lines 661-662).

    Finally, thank you for the idea to sort objects in the recognition memory test by whether they were from episodic- or incremental-based choice trials to provide a further test of whether our approach for sorting episodic decisions withstands an independent test. We performed this analysis and found that, in both samples, participants had better memory for objects from episodic-based choice trials. This result provides further support for the putative episodic nature of these trials and is now reported in the Results (lines 300-304 and Appendix 1), Methods (lines 737-742) and appears as a new panel in Figure 5 (Figure 5A).

    1. The paper is framed as tapping into a trade-off between the use of episodic memory vs. incremental learning, but it is not clear why participants would not use episodic memory in this particular task setup whenever it is available to them. The authors mention that there is "computational expense" to episodic memory, but retrieval of an already-established strong episodic memory could be quite effortless and even automatic. Why not always use it, since it is guaranteed in this task to be a better source of information for the decision? If it is true that RT is higher when using episodic memory, that is helpful toward establishing the trade-off, so this links to the concern above about how confident we can be about the use of episodic memory in particular trials.

    Thank you for raising this important point and for giving us the opportunity to clarify. We now address this point in two ways: first, we provide a new analysis of episodic memory and choice behavior and we address this point explicitly in the discussion.

    As now emphasized in the paper (lines 118-122 and lines 384-388), in this task, it is true that an observer with perfect episodic memory should always make use of it whenever available (i.e. on trials featuring previously seen objects). However, human memory is fallible and resourcelimited, and we find that participants with less reliable episodic memory overall actually relied less on this strategy and more on incremental learning throughout the task (Figure 5C and 5D). In other words, there is noise and uncertainty also in the episodic memory trace. While it is not the main focus of our study, the noise in episodic memory is indeed another reason why trading off between episodic memory and incremental learning is advantageous for behavior. We further agree that while the RT effects show that, relative to using incremental value, episodic memory retrieval takes longer, we cannot make strong statements about effort or “computational expense” per se from our data. Accordingly, we have removed the “computational expense” phrase (line 491), as well as our suggestion that episodic retrieval is “perhaps more effortful overall” (line 181), from the paper.

    Reviewer #2 (Public Review):

    This manuscript addresses the broad question of when humans use different learning and memory systems in the service of decision-making. Previous studies have shown that, even in tasks that can be performed well using incremental trial-and-error learning, choices can sometimes be based on memories of individual past episodes. This manuscript asks what determines the balance between incremental learning and episodic memory, and specifically tests the idea that the uncertainty associated with each alters the balance between them in a rational way. Using a task that can separate the influence of incremental learning and episodic memory on choice in two large online samples, several lines of evidence supporting this hypothesis are reported. People are more likely to rely on episodic memory in more volatile environments when incremental learning is more uncertain and during periods of increased uncertainty within a given environment. Individuals with more accurate episodic memories are also more likely to rely on episodic memory and less likely to rely on incremental learning. These data are compelling, even more so because all of the main findings are directly replicated in a second sample. These data extend the notion of uncertainty-based arbitration between different forms of learning/memory, which has been proposed and evaluated in other contexts, to the case of episodic memory versus incremental learning.

    The weaknesses in the paper are mostly minor. One potential weakness is the nature of the online sample. Many participants apparently did not respond to the volatility manipulation, making it impossible to test whether this altered their choices. It is unclear whether this is a feature of online samples (where people can be distracted, unmotivated, etc.) or of human performance more generally.

    Thank you for your comments. Indeed, we also found it interesting that many participants were insensitive to the manipulation of volatility in our study, as assessed and filtered based on the initial deck learning task. As you note, our study is not positioned to determine the cause and whether this is due to the online population or human performance more generally, and we added a discussion of this point to the paper (lines 477-485). Also, fractions exceeding 1/3 apparently inattentive participants are very much the norm in our experience with other online studies across many tasks. While there is much to say about the implications of this (see e.g. Zorowitz, Niv & Bennett PsyArXiv 2021), our basic philosophy (which we follow here) is that it is best practice, and conservative, to exclude aggressively so as to focus analyses on those participants for whom the experimental questions can meaningfully be asked.

    Reviewer #3 (Public Review):

    The purpose of this work is to test the hypothesis that uncertainty modulates the relative contributions of episodic and incremental learning to decisions. The authors test this using a "deck learning and card memory task" featuring a 2-alternative forced choice between two cards, each showing a color and an object. The cards are drawn from different colored decks with different average values that stochastically reverse with fixed volatility, and also feature objects that can be unfamiliar or familiar. Objects are not shown more than twice, and familiar objects have the same value as they did when shown previously. This allows the authors to construct an index of episodic contributions to decision-making: in cases where the previous value of the object is incongruous with the incrementally observed value, the subject's choice reveals which strategy they are relying on.

    The key manipulation is to introduce high- and low- volatility conditions, as high volatility has been shown to induce uncertainty in incremental learning by causing subjects to adopt an optimal low learning rate. The authors find that the subjects show a higher episodic choice index in the high-volatility condition, and in particular immediately after reversals when the model predicts uncertainty is at a maximum. The authors also construct a trial-wise index of uncertainty and show that episodic index correlates with this measure. The authors also find that at the subject level, the overall episodic choice index correlates with the ability to accurately identify familiar objects, and the reason that this indicates higher certainty in episodic memory is predicting the usage of episodic strategies. The authors replicate all of their findings in a second subject population.

    This is a very interesting study with compelling results on an important topic. The task design was a clever way to disentangle and measure different learning strategies, which could be adopted by others seeking to further understand the contributions of different strategies to decision-making and its neural underpinnings. The article is also very clearly written and the results clearly communicated.

    A number of questions remain regarding the interpretation of the results that I think would be addressed with further analysis and modeling.

    At a conceptual level, I was unsure about the equivalence drawn between volatility and uncertainty: the main experiments and analyses all regard reversals and comparisons of volatility conditions, but the conclusions are more broadly about uncertainty. Volatility, as the authors note, is only one way to induce uncertainty. It also doesn't seem like the most obvious way to intervene on uncertainty (eg manipulated trial-wise variance seems more obvious). The trial-wise relative uncertainty measurements in Fig 4 speak a bit more to the question of uncertainty more generally, but these were not the main focus and also do not disambiguate between trial-wise uncertainty derived from reversals versus within block variation.

    Thank you for your comments. We agree that this distinction was unclear and appreciate the opportunity to clarify. We hope the manuscript is now clear about the conceptual distinction between uncertainty as the construct of theoretical interest vs. volatility as the operational manipulation being used to access it. We have adjusted the presentation and added discussion to clarify this, and also enhanced the trial-wise analyses to strengthen the interpretation of results in terms of uncertainty more generally. Regarding obviousness, we think perhaps there is a difference between areas of study on this point. While trial-wise outcome variance (which we call stochasticity) has been widely used to manipulate uncertainty in perceptual and sensorimotor studies, it has been more rarely manipulated in reward learning studies, where instead the volatility manipulation we use has predominated. We have a recent paper reviewing examples of both and arguing that the field has underemphasized the importance of stochasticity, so we are sympathetic here (Piray and Daw, Nature Communications 2021).

    In any case, to address these points on revision, we have reframed the first section of the results, where we look at effects of environment on episodic-based choice, to focus primarily on volatility. Specifically, we have expanded on our explanation of how volatility induces uncertainty, changed the subtitle of the section from ‘uncertainty’ to ‘volatility’, and have specified that the prediction in this section is primarily about volatility (lines 97 and 116-123). We also reframed the second section of the results to be primarily about the uncertainty induced by volatility: while differences between the environments capture coarse effects of volatility, trialwise uncertainty should be present following reversals across both environments. We have now focused our explanation in this section on trial-wise uncertainty within the environments rather than volatility between the environments (lines 184-192). Further, we agree that there are other sources of uncertainty besides volatility that we did not manipulate in the paper, and that it remains for future work whether their manipulation would produce similar results. To amend this, we have added a new paragraph to the discussion covering these alternative sources and further qualifying the scope of our conclusions (lines 434-446).

    We also agree that our analyses in Figure 4 did not yet speak to differences in episodic-based choice that may arise due to blockwise volatility (as captured by the categorical effect of environment) vs. trial-to-trial fluctuations in uncertainty (as captured by relative uncertainty, over and above the blockwise effect). We have addressed this by adding an additional, separate effect of the interaction between environment and episodic value to our combined choice models which is explained in more detail in the recommendations for the authors portion of our response. These changes and results are described in the Methods (lines 686-694) and Results (lines 276-277; Figure 4C).

    Another key question I had about design choice was the decision to use binary rather than drifting values. Because of this, the subjects could be inferring context rather than continuously incrementing value estimates (eg Gershman et al 2012, Akam et al 2015): the subjects could be inferring which context they are in rather than tracking the instantaneous value + uncertainty. I am not sure this would qualitatively affect the results, as volatility would also affect context confidence, but it is a rather different interpretation and could invoke different quantitative predictions. And it might also have some qualitative bearing on results: the subjects have expectations about how long they will stay in a particular environment, and they might start anticipating a context change after a certain amount of time which would lead to an increase in uncertainty not just immediately after switches, but also after having stayed in the environment for a long period of time. Moreover, depending on the variance within context, there may be little uncertainty following context shifts.

    Thank you for raising this important point. To address the possibility that the task structure could have encouraged participants to infer context rather than engage in incremental learning, we added an alternative contextual inference (CI) model, based on a hidden Markov model with two hidden states (e.g. that either the red deck is lucky and the blue deck unlucky or vice versa). This model is now described in the Results of the main text (lines 226-228), listed in the Methods (line 674), and explained in detail in Appendix 3 alongside the computational models of incremental learning. Following model comparison, we found that this model provided a worse fit than the incremental learning models we previously presented in both samples, suggesting that incremental learning is a better descriptor of participants’ choices in this task than contextual inference. The results of this comparison are reflected in an updated Figure 3A.

  2. eLife assessment

    This paper posits that higher uncertainty environments should lead to more reliance on episodic memory, finding compelling evidence for this idea across several analysis approaches and across two independent samples. This is an important paper that will be of interest to a broad group of learning, memory, and decision-making researchers.

  3. Reviewer #1 (Public Review):

    This paper tests whether people vary their reliance on episodic memory vs. incremental learning as a function of the uncertainty of the environment. The authors posit that higher uncertainty environments should lead to more reliance on episodic memory, and they find evidence for this effect across several kinds of analyses and across two independent samples.

    The paper is beautifully written and motivated, and the results and figures are clear and compelling. The replication in an independent sample is especially useful. I think this will be an important paper of interest to a broad group of learning, memory, and decision-making researchers. I have only two points of concern about the interpretation of the results:

    1. My main concern regards the indirect indicator of participants' use of episodic memory on a given trial. The authors assume that episodic memory is used if the value of the chosen object (as determined by its value the last time it was presented) does not match the current value of the deck it is presented in. They find that these mismatch choices happen more often in the high-volatility environment. But if participants simply choose in a more noisy/exploratory way in the high volatility environment, I believe that would also result in more mismatched judgments. What proportion of the trials labeled as episodic should we expect to be a result of noise or exploration? It seems conceivable that a judgment to explore could take longer, and result in the observed RT effects. Perhaps it could be useful to match up putative episodic trials with later recognition memory for those particular items. The across-subjects correlations are an indirect version of this, but could potentially be subject to a related concern if participants who explore more (and are then judged as more episodic) also simply have a better memory.

    2. The paper is framed as tapping into a trade-off between the use of episodic memory vs. incremental learning, but it is not clear why participants would not use episodic memory in this particular task setup whenever it is available to them. The authors mention that there is "computational expense" to episodic memory, but retrieval of an already-established strong episodic memory could be quite effortless and even automatic. Why not always use it, since it is guaranteed in this task to be a better source of information for the decision? If it is true that RT is higher when using episodic memory, that is helpful toward establishing the trade-off, so this links to the concern above about how confident we can be about the use of episodic memory in particular trials.

  4. Reviewer #2 (Public Review):

    This manuscript addresses the broad question of when humans use different learning and memory systems in the service of decision-making. Previous studies have shown that, even in tasks that can be performed well using incremental trial-and-error learning, choices can sometimes be based on memories of individual past episodes. This manuscript asks what determines the balance between incremental learning and episodic memory, and specifically tests the idea that the uncertainty associated with each alters the balance between them in a rational way. Using a task that can separate the influence of incremental learning and episodic memory on choice in two large online samples, several lines of evidence supporting this hypothesis are reported. People are more likely to rely on episodic memory in more volatile environments when incremental learning is more uncertain and during periods of increased uncertainty within a given environment. Individuals with more accurate episodic memories are also more likely to rely on episodic memory and less likely to rely on incremental learning. These data are compelling, even more so because all of the main findings are directly replicated in a second sample. These data extend the notion of uncertainty-based arbitration between different forms of learning/memory, which has been proposed and evaluated in other contexts, to the case of episodic memory versus incremental learning.

    The weaknesses in the paper are mostly minor. One potential weakness is the nature of the online sample. Many participants apparently did not respond to the volatility manipulation, making it impossible to test whether this altered their choices. It is unclear whether this is a feature of online samples (where people can be distracted, unmotivated, etc.) or of human performance more generally.

  5. Reviewer #3 (Public Review):

    The purpose of this work is to test the hypothesis that uncertainty modulates the relative contributions of episodic and incremental learning to decisions. The authors test this using a "deck learning and card memory task" featuring a 2-alternative forced choice between two cards, each showing a color and an object. The cards are drawn from different colored decks with different average values that stochastically reverse with fixed volatility, and also feature objects that can be unfamiliar or familiar. Objects are not shown more than twice, and familiar objects have the same value as they did when shown previously. This allows the authors to construct an index of episodic contributions to decision-making: in cases where the previous value of the object is incongruous with the incrementally observed value, the subject's choice reveals which strategy they are relying on.

    The key manipulation is to introduce high- and low- volatility conditions, as high volatility has been shown to induce uncertainty in incremental learning by causing subjects to adopt an optimal low learning rate. The authors find that the subjects show a higher episodic choice index in the high-volatility condition, and in particular immediately after reversals when the model predicts uncertainty is at a maximum. The authors also construct a trial-wise index of uncertainty and show that episodic index correlates with this measure. The authors also find that at the subject level, the overall episodic choice index correlates with the ability to accurately identify familiar objects, and the reason that this indicates higher certainty in episodic memory is predicting the usage of episodic strategies. The authors replicate all of their findings in a second subject population.

    This is a very interesting study with compelling results on an important topic. The task design was a clever way to disentangle and measure different learning strategies, which could be adopted by others seeking to further understand the contributions of different strategies to decision-making and its neural underpinnings. The article is also very clearly written and the results clearly communicated.

    A number of questions remain regarding the interpretation of the results that I think would be addressed with further analysis and modeling.

    At a conceptual level, I was unsure about the equivalence drawn between volatility and uncertainty: the main experiments and analyses all regard reversals and comparisons of volatility conditions, but the conclusions are more broadly about uncertainty. Volatility, as the authors note, is only one way to induce uncertainty. It also doesn't seem like the most obvious way to intervene on uncertainty (eg manipulated trial-wise variance seems more obvious). The trial-wise relative uncertainty measurements in Fig 4 speak a bit more to the question of uncertainty more generally, but these were not the main focus and also do not disambiguate between trial-wise uncertainty derived from reversals versus within block variation.

    Another key question I had about design choice was the decision to use binary rather than drifting values. Because of this, the subjects could be inferring context rather than continuously incrementing value estimates (eg Gershman et al 2012, Akam et al 2015): the subjects could be inferring which context they are in rather than tracking the instantaneous value + uncertainty. I am not sure this would qualitatively affect the results, as volatility would also affect context confidence, but it is a rather different interpretation and could invoke different quantitative predictions. And it might also have some qualitative bearing on results: the subjects have expectations about how long they will stay in a particular environment, and they might start anticipating a context change after a certain amount of time which would lead to an increase in uncertainty not just immediately after switches, but also after having stayed in the environment for a long period of time. Moreover, depending on the variance within context, there may be little uncertainty following context shifts.