Resource-rational account of sequential effects in human prediction

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This work is relevant to understanding how people represent uncertain events in the world around them and make decisions, with broad applications to economic behavior. It addresses a long-standing empirical puzzle from a novel perspective, where the authors propose that sequential effects in perceptual decisions may emerge from rational choices under cognitive resource constraints rather than adjustments to changing environments. Two new computational models have been constructed to predict behavior under two different constraints, among which the one assuming higher cost for more precise beliefs is better supported by new experimental data. The conclusion may be further strengthened by comparison with alternative models and (optionally) evidence from additional data.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

An abundant literature reports on ‘sequential effects’ observed when humans make predictions on the basis of stochastic sequences of stimuli. Such sequential effects represent departures from an optimal, Bayesian process. A prominent explanation posits that humans are adapted to changing environments, and erroneously assume non-stationarity of the environment, even if the latter is static. As a result, their predictions fluctuate over time. We propose a different explanation in which sub-optimal and fluctuating predictions result from cognitive constraints (or costs), under which humans however behave rationally. We devise a framework of costly inference, in which we develop two classes of models that differ by the nature of the constraints at play: in one case the precision of beliefs comes at a cost, resulting in an exponential forgetting of past observations, while in the other beliefs with high predictive power are favored. To compare model predictions to human behavior, we carry out a prediction task that uses binary random stimuli, with probabilities ranging from 0.05 to 0.95. Although in this task the environment is static and the Bayesian belief converges, subjects’ predictions fluctuate and are biased toward the recent stimulus history. Both classes of models capture this ‘attractive effect’, but they depart in their characterization of higher-order effects. Only the precision-cost model reproduces a ‘repulsive effect’, observed in the data, in which predictions are biased away from stimuli presented in more distant trials. Our experimental results reveal systematic modulations in sequential effects, which our theoretical approach accounts for in terms of rationality under cognitive constraints.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    In this paper, the authors develop new models of sequential effects in a simple Bernoulli learning task. In particular, the authors show evidence for both a "precision-cost" model (precise posteriors are costly) and an "unpredictabilitycost" model (expectations of unpredictable outcomes are costly). Detailed analyses of experimental data partially support the model predictions.

    Strengths:

    • Well-written and clear.
    • Addresses a long-standing empirical puzzle.
    • Rigorous modeling.

    Weaknesses:

    • No model adequately explains all of the data.
    • New empirical dataset is somewhat incremental.
    • Aspects of the modeling appear weakly motivated (particularly the unpredictability model).
    • Missing discussion of some relevant literature.

    We thank Reviewer #1 for her/his positive comments on our work and her/his comments and suggestions.

    Reviewer #2 (Public Review):

    This paper argues for an explanation of sequential effects in prediction based on the computational cost of representing probability distributions. This argument is made by contrasting two cost-based models with several other models in accounting for first- and second-order dependencies in people's choices. The empirical and modeling work is well done, and the results are compelling.

    We thank Reviewer #2 for her/his positive comments on our work.

    The main weaknesses of the paper are as follows:

    1. The main argument is against accounts of dependency based on sensitivity to statistics (ie. modeling the timeseries as having dependencies it doesn't have). However, such models are not included in the model comparison, which makes it difficult to compare these hypotheses.

    Many models in the sequential-effects literature (Refs. [7-12] in the manuscript) are ‘leaky-integration’ models that interpret sequential effects as resulting from an attempt to learn the statistics of a sequence of stimuli, through exponentiallydecaying counts of the simple patterns in the sequence (e.g., single stimuli, repetitions, and alternations). In some studies, the ‘forgetting’ of remote observations that results from the exponential decay is justified by the fact that people live in environments that are usually changing: it is thus natural that they should expect that the statistics underlying the task’s stimuli undergo changes (although in most experiments, they do not), and if they expect changes, then they should discard old observations that are not anymore relevant. This theoretical justification raises the question as to why subjects do not seem to learn that the generative parameters in these tasks are in fact not changing — all the more as other studies suggest that subjects are able to learn the statistics of changes (and consistently they are able to adapt their inference) when the environment does undergo changes (Refs. [42,57]).

    Our models are derived from a different approach: we derive behavior from the resolution of a problem of constrained optimization of the inference process. It is not a phenomenological model. When the constraint that weighs on the inference process is a cost on the precision of the posterior, as measured by its entropy, we find that the resulting posterior is one in which remote observations are ‘forgotten’, through an exponentially discount, i.e., we recover the predictions of the leaky-integration models, which past studies have empirically found to be reasonably good accounts of sequential effects. (Thus these models are already in our model comparison.) In our framework, the sequential effects do not stem from the subjects’ irrevocable belief that the statistics of the stimuli change from time to time, but rather from the difficulty that they have in representing precise belief; a rather different theoretical justification.

    Furthermore, we show that a large fraction of subjects are not best-fitted by precision-cost models (i.e., they are not best-fitted by leaky integration), but instead they are best fitted by unpredictability-cost models. These models suggest a different explanation of sequential effects: that they result from the subjects favoring predictable environments, in their inference. In the revised version of the manuscript, we have made clearer that the derivation of the optimal posterior under a precision cost results in the exponential forgetting of remote observations, as in the leaky-integration models. We mention it in the abstract, in the Introduction (l. 76-78), in the Results when presenting the precision-cost models (l. 264-278), and in the Discussion (l.706-716).

    1. The task is not incentivized in any way. Since incentives are known to affect probability-matching behaviors, this seems important. In particular, we might expect incentives would trade off against computational costs - people should increase the precision of their representations if it generates more reward.

    We thank Reviewer #2 for her/his attention to our paper and for her/his comments. As for the point on the models, see answer above (point 1).

    As for the point on incentivization: we agree that it would be very interesting to measure whether and to which extent the performance of subjects increases with the level of incentivization. Here, however, we wanted, first, to establish that subjects’ behavior could be understood as resulting from inference under a cost, and second, to examine the sensitivity of their predictions to the underlying generative probability — rather than to manipulating a tradeoff involving this cost (e.g. with financial reward). We note that we do find that subjects are sensitive to the generative probability, which implies that they exhibit some degree of motivation to put some effort in the task (which is the goal of incentivization), in spite of the lack of economic incentives. But it would indeed be interesting to know how the potential sensitivity to reward interacts with the sensitivity to the generative probability. Furthermore, as Reviewer #2 mentions, some studies show that incentives affect probability-matching behavior: it is then unclear whether the introduction of incentives in our task would change the inference of subjects (through a modification of the optimal trade-off that we model); or whether it would change their probability-matching behavior, as modeled by our generalized probability-matching response-selection strategy; or both. Note that we disentangled both aspects in our modeling and that our conclusions are about the inference, not the response-selection strategy. We deem the incentivization effects very much worth investigating; but they fall outside of the scope of our paper.

    We now mention this point in the Discussion of the revised manuscript (l. 828-840).

    1. The sample size is relatively small (20 participants). Even though a relatively large amount of data is collected from each participant, this does make it more difficult to evaluate the second-order dependencies in particular (Figure 6), where there are large error bars and the current analysis uses a threshold of p < .05 across a large number of tests hence creating a high false-discovery risk.

    Indeed we agree with Reviewer #2 that as the number of tests increases, so does the probability that at least one null hypothesis is rejected at a given level, even if the null hypothesis is correct. But in the panels a, b and c of Figure 6, about half of the tests are rejected, which is very unlikely under the null hypothesis that there is no effect of the stimulus history on the prediction, all the more as the signs of the non-significant results are in most cases consistent with the direction of the significant results. (In panel e, which reports a finer analysis in which the number of subjects is essentially divided by 2, about a fourth of the tests are rejected, and here also the non-significant results are almost all in the same direction as the significant ones.)

    However, we agree that there remains a risk of false discovery, thus we applied a Bonferroni-Holm-Šidák correction to the p-values in order to mitigate this risk. With these more conservative p-values, a lower number of tests are rejected, but in most cases in Fig. 6abc the effects remain significant. In particular, we are confident that there is a repulsive effect of the third-to-last stimulus in the case of Fig. 6c, while there is an attractive effect in the other cases.

    In the revised manuscript, Figure 6 now reports whether the tests are rejected when the p-values are corrected with the Bonferroni-Holm-Šidák correction.

    (We also applied this correction to the p-values of the tests in Fig. 2, which has more data: the corrected p-values are all below 1e-13, which we now indicate in the caption of this figure.)

    1. In the key analyses in Figure 4, we see model predictions averaged across participants. This can be misleading, as the average of many models can produce behavior outside the class of functions the models themselves can generate. It would be helpful to see the distribution of raw model predictions (ideally compared against individual data from humans). Minimally, showing predictions from representative models in each class would provide insight into where specific models are getting things right and wrong, which is not apparent from the model comparison.

    In the main text of the original manuscript, we showed the behavior of the pooled responses of the best-fitting models, and we agree with Reviewer #2 that it did not make clear to the reader that the apparent ability of the models to reproduce the subjects’ behavioral patterns was not a misleading byproduct of the averaging of different models. In the original version of the manuscript, we had put a figure showing the behavior of each individual model (each cost type with each Markov order) in the Methods section of the paper; but this could easily be overlooked, and indeed it would be beneficial for the reader to be shown the typical behaviors of the models, in the main text. We have reorganized the presentation of the models’ behaviors: the first panels in Fig. 4 (in the main text) are now dedicated to showing the individual sequential effects of the precision-cost and of the unpredictabilitycost models with Markov order 0 and 1. The Figure 4 is reproduced in the response to Reviewer #1, above, along with comments on the sequential effects produced by these models (and also on the impact of the generalized probability-matching response-selection strategy, in comparison with the traditional probability matching). We believe that this figure makes clearer how the individual models are able to reproduce the patterns in subjects’ predictions — in particular it shows that this ability of the models is not just an artifact of the averaging of many models, as was the legitimate concern of Reviewer #2. We have left the illustration of the firstorder sequential effects of the other models (with Markov order 2 and 3) in the Methods section (Fig. 7), so as not to overload Fig. 4, and because they do not bring new critical conceptual points.

    As for the higher-order sequential effects, the updated Figure 5, also reproduced above in the responses to Reviewer #1, now includes the sequential effects obtained with the precision-cost model of a Bernoulli observer (m=0), in addition to the precision-cost model of a Markov observer (m=1) and to the unpredictabilitycost model of a Markov observer (m=3), in order to better illustrate the behaviors of the different models. The higher-order sequential effects of the other models can be found in Fig. 8 in Methods.

    Reviewer #3 (Public Review):

    This manuscript offers a novel account of history biases in perceptual decisions in terms of bounded rationality, more specifically in terms of finite resources strategy. Bridging two works of literature on the suboptimalities of human decision-making (cognitive biases and bounded rationality) is very valuable per se; the theoretical framework is well derived, building upon the authors' previous work; and the choice of experiment and analysis to test their hypothesis is adequate. However, I do have important concerns regarding the work that do not enable me to fully grasp the impact of the work. Most importantly, I am not sure whether the hypothesis whereby inference is biased towards avoiding high precision posterior is equivalent or not to the standard hypothesis that inference "leaks" across time due to the belief that the environment is not stationary. This and other important issues are detailed below. I also think that the clarity and architecture of the manuscript could be greatly improved.

    We thank Reviewer #3 for her/his positive comments on our work and her/his comments and suggestions.

    1. At this point it remains unclear what is the relationship between the finite resources hypothesis (the only bounded rationality hypothesis supported by the data) and more standard accounts of historical effects in terms of adaptation to a (believed to be) changing environment. The Discussion suggests that the two approaches are similar (if not identical) at the algorithmic level: in one case, the posterior belief is stretched (compared to the Bayesian observer for stationary environments) due to precision cost, in other because of possible changes in the environment. Are the two formalisms equivalent? Or could the two accounts provide dissociable predictions for a different task? In other words, if the finite resources hypothesis is not meant to be taken as brain circuits explicitly minimizing the cost (as stated by the authors), and if it produces the same type of behavior as more classical accounts: is the hypothesis testable experimentally?

    We agree with Reviewer #3 that the relation between our approach and other approaches in the literature should be made clearer to the reader.

    Since the 1990s, in the psychology and neuroscience literature, many models of perception and decision-making have featured an exponential decay of past observations, resulting in an emphasis, in decisions, of the more recent evidence (‘leaky integration’, Refs. [7-12, 76-86]). In the context of sequential effects, this mechanism has found a theoretical justification in the idea that people believe that statistics typically change, and thus that remote observations should indeed be discarded [8,12]. In inference tasks with binary signals, in which the optimal Bayesian posterior is in many cases a Beta distribution whose two parameters are the counts of the two signals, one way to conveniently incorporate a forgetting mechanism is to replace these counts with exponentially-filtered counts, in which more recent observations have more weight (e.g., Ref. [12]).

    Our approach to sequential effects is not grounded in the history of leakyintegration models: we assume, first, that subjects attempt at learning the statistics of the signals presented to them (this is also the assumption in many studies [712]), and second, that their inference is subject to a cost, which prevents them from reaching the optimal, Bayesian posterior; but under the constraint of this cost, they choose the optimal posterior. We formalize this as a problem of constrained optimization.

    The two formalisms are thus not equivalent. Beyond the fact that we clearly state the problem which we assume the brain is solving, we do not propose that the origin of sequential effects resides in an adaptation to putatively changing environments: instead, we assume that they originate in a cognitive cost internal to the decision-maker. If this cost is proportional to the entropy of the posterior, as in our precision cost, then the optimal approximate posterior is one in which remote observations are ‘forgotten’ through an exponential filter, as in the leakyintegration models. In other words, in the context of this task and with this kind of cost, the models are, as Reviewer #3 writes, identical at the algorithmic level. As for the unpredictability cost, it does not result in a solution that resembles leaky integration; about half the subjects, however, are best fitted by unpredictabilitycost models. We thus provide a different rationale for sequential effects — that the brain favors predictive environment, in its inference — and this alternative account is successful in capturing the behavior of a large fraction of the subjects.

    In the revised manuscript, we now clarify that the precision cost results in leaky integration, in the abstract, in the Introduction (l. 76-78), in our presentation of the precision-cost models (Results section, l. 264-275), and in the Discussion (l. 706716). (We also refer Reviewer #3 to our response to the first comment of Reviewer #2, above.)

    Finally, Reviewer #3 asks the interesting question as to whether the “two accounts provide dissociable predictions for a different task”. Given that the leakyintegration approach is justified by an adaptation to potential changes, and our approach relies on the hypothesis that precision in beliefs is costly, one way to disentangle the two would be to eliminate the sequential nature of the task and presenting instead observations simultaneously. This would eliminate the mere notion of change across time. In this case, the leaky account would predict that subjects’ inference becomes optimal (because the leak should disappear in the absence of change), while in the second approach the precision cost would still weigh on the inference, and result in approximate posteriors that are “wider” (less precise) than the optimal one. The resulting divergence in the predictions of these models is very interesting, but out of the scope of this study on sequential effects.

    1. The current analysis of history effects may be confounded by effects of the motor responses (independently from the correct response), e.g. a tendency to repeat motor responses instead of (or on top of) tracking the distribution of stimuli.

    We thank Reviewer #3 for pointing out the possibility that subjects may have a tendency to repeat motor responses that is not related to their inference.

    We note that in Urai et al., 2017, as in many other sensory 2AFC tasks, successive trials are independent: the stimulus at a given trial is a random event independent of the stimulus at the preceding trial; the response at a given trial should in principle be independent of the stimulus at the preceding trial; and the response at the preceding trial conveys no information about the response that should be given at the current trial (although subjects might exhibit a serial dependency in their responses). By contrast, in our task an event is more likely than not to be followed by the same event (because observing this event suggests that its probability is greater than .5); and a prediction at a given trial should be correlated with the stimuli at the preceding trials, and with the predictions at the preceding trials. In a logit model (or any other GLM), this would mean that the predictors exhibit multicollinearity, i.e., they are strongly correlated. Multicollinearity does not reduce the predictive power of a model, but it makes the identification of parameters extremely unreliable: in other words, we wouldn’t be able to confidently attribute to each predictor (e.g., the past observations and the past responses) a reliable weight in the subjects’ decisions. Furthermore, our study shows that past stimuli can yield both attractive and repulsive effects, depending on the exact sequence of past observations. To capture this in a (generalized) linear model, we would have to introduce interaction terms for each possible past sequence, resulting in a very high number of parameters to be identified.

    However, this does not preclude the possibility that subjects may have a motor propensity to repeat responses. In order to take this hypothesis into account, we examined the behavior and the ability to capture subjects’ data of models in which the response-selection strategy allows for the possibility of repeating, or alternating, the preceding response. Specifically, we consider models that are identical to those in our study, except for the response-selection strategy, which is an extension of the generalized probability-matching strategy, in which a parameter eta, greater than -1 and lower than 1, determines the probability that the model subject repeats its preceding response, or conversely alternates and chooses the other response. With probability 1-|η|, the model subject follows the generalized probability-matching response-selection strategy (parameterized by κ). With probability |η|, the model subject repeats the preceding response, if η > 0, or chooses the other response, if η < 0. We included the possibility of an alternation bias (negative η), but we find that no subject is best-fitted by a negative η, thus we focus on the repetition bias (positive η). We fit the models by maximizing their likelihoods, and we compared, using the Bayesian Information Criterion (BIC), the quality of their fit to that of the original models that do not include a repetition propensity.

    Taking into account the repetition bias of subjects leaves the assignment of subjects into two families of inference cost mostly unchanged. We find that for 26% of subjects the introduction of the repetition propensity does not improve the fit (as measured by the BIC) and can therefore be discarded. For 47% of subjects, the fit is better with the repetition propensity (lower BIC), and the best-fitting inference model (i.e., the type of cost, precision or unpredictability, and the Markov order) is the same with or without repetition propensity. Thus for 73% (=26+47) of subjects, allowing for a repetition propensity does not change the inference model. We also find that the best-fitting parameters λ and κ, for these subjects, are very stable, when allowing or not for the repetition propensity. For 11% of subjects, the fit is better with the repetition propensity, and the cost type of the inference model is the same (as without the repetition propensity), but the Markov order changes. For the remaining 16%, both the cost type and the Markov order change.

    Thus for a majority of subjects, the BIC is improved when a repetition propensity is included, suggesting that there is indeed a tendency to repeat responses, independent of the subjects’ inference process and generative stimulus probability. In Figure 7, in Methods, we show the behavior of the models without repetition propensity, and with repetition propensity, with a parameter η = 0.2 close to the average best-fitting value of eta across subjects. We show, in Methods, that (i) the unconditional probability of a prediction A, p(A), is the same with and without repetition propensity, and that (ii) the conditional probabilities p(A|A) and p(A|B) when η≠0 are weighted means of the unconditional probability p(A) and of the conditional probabilities when eta=0 (see p. 47-49 of the revised manuscript).

    In summary, our results suggest that a majority of subjects do exhibit a propensity to repeat their responses. Most subjects, however, are best-fitted by the same inference model, with or without repetition propensity, and the parameters λ and κ are stable, across these two cases; this speaks to the robustness of our model fitting. We conclude that the models of inference under a cost capture essential aspects of the behavioral data, which does not exclude, and is not confounded by, the existence of a tendency, in subjects, to repeat motor responses.

    In the revised manuscript, we present this analysis in Methods (p.47-49), and we refer to it in the main text (l. 353-356 and 400-406).

    1. The authors assume that subjects should reach their asymptotic behavior after passively viewing the first 200 trials but this should be assessed in the data rather than hypothesized. Especially since the subjects are passively looking during the first part of the block, they may well pay very little attention to the statistics.

    The assumptions that subjects reach their asymptotic behavior after being presented with 200 observations in the passive trials should indeed be tested. To that end, we compared the behavior of the subjects in the first 100 active trials with their behavior in the remaining 100 active trials. The results of this analysis are shown in Figure 9.

    For most values of the stimulus generative probability, the unconditional proportions of predictions A, in the first and the second half (panel a, solid and dashed gray lines), are not significantly different (panel a, white dots), except for two values (p-value < 0.05; panel a, filled dots). Although in most cases the difference between the two is not significant, in the second half the proportions of prediction A seem slightly closer to the extremes (0 and 1), i.e., closer to the optimal proportions. As for the sequential effects, they appear very similar in the two halves of trials. We conclude that for the purpose of our analysis we can reasonably consider that the behavior of the subjects is stationary throughout the task.

    1. The experiment methods are described quite poorly: when is the feedback provided? What is the horizontal bar at the bottom of the display? What happens in the analysis with timeout trials and what percentage of trials do they represent? Most importantly, what were the subjects told about the structure of the task? Are they told that probabilities change over blocks but are maintained constant within each block?

    We thank Reviewer #3 for her/his close attention to the details of our experiment. Here are the answers to the reviewer’s questions:

    • The feedback (i.e., a lightning strike on the left or the right rod, with the rod and the battery turning yellow if the strike is on the side predicted by the subject,) is immediate, i.e., it is provided right after the subject makes a prediction, with no delay. We now indicate this in the caption of Figure 1.

    • The task is presented to the subjects as a game in which predicting the correct location of the lightning strike results in electric power being collected in the battery. The horizontal bar at the bottom of the display is a gauge that indicates the amount of power collected in the current block of trials. It has no operational value in the task. We now mention it in the Methods section (l. 872-874).

    • The timeout trials were not included in the analysis. The timeout trials represented 1.27% of the trials, on average (across subjects); and for 95% of the subjects the timeout trials represented less than 2.5% of the trials. This information was added in Methods (l. 887-889).

    • Each new block of trials was presented to the subject as the lightning strikes occurring in a different town. The 200 passive trials at the beginning of each block, in which subjects were asked to observe a sequence of 200 strikes, were presented as the ‘track record’ for that town, and the instructions indicated that it was ‘useful’ to know this track record. No information was given on the mechanism governing the locations of the strikes. In the main text of the revised manuscript, we now include these details when describing the task (p. 6).

  2. eLife assessment

    This work is relevant to understanding how people represent uncertain events in the world around them and make decisions, with broad applications to economic behavior. It addresses a long-standing empirical puzzle from a novel perspective, where the authors propose that sequential effects in perceptual decisions may emerge from rational choices under cognitive resource constraints rather than adjustments to changing environments. Two new computational models have been constructed to predict behavior under two different constraints, among which the one assuming higher cost for more precise beliefs is better supported by new experimental data. The conclusion may be further strengthened by comparison with alternative models and (optionally) evidence from additional data.

  3. Reviewer #1 (Public Review):

    In this paper, the authors develop new models of sequential effects in a simple Bernoulli learning task. In particular, the authors show evidence for both a "precision-cost" model (precise posteriors are costly) and an "unpredictability-cost" model (expectations of unpredictable outcomes are costly). Detailed analyses of experimental data partially support the model predictions.

    Strengths:
    - Well-written and clear.
    - Addresses a long-standing empirical puzzle.
    - Rigorous modeling.

    Weaknesses:
    - No model adequately explains all of the data.
    - New empirical dataset is somewhat incremental.
    - Aspects of the modeling appear weakly motivated (particularly the unpredictability model).
    - Missing discussion of some relevant literature.

  4. Reviewer #2 (Public Review):

    This paper argues for an explanation of sequential effects in prediction based on the computational cost of representing probability distributions. This argument is made by contrasting two cost-based models with several other models in accounting for first- and second-order dependencies in people's choices. The empirical and modeling work is well done, and the results are compelling.

    The main weaknesses of the paper are as follows:

    1. The main argument is against accounts of dependency based on sensitivity to statistics (ie. modeling the timeseries as having dependencies it doesn't have). However, such models are not included in the model comparison, which makes it difficult to compare these hypotheses.

    2. The task is not incentivized in any way. Since incentives are known to affect probability-matching behaviors, this seems important. In particular, we might expect incentives would trade off against computational costs - people should increase the precision of their representations if it generates more reward.

    3. The sample size is relatively small (20 participants). Even though a relatively large amount of data is collected from each participant, this does make it more difficult to evaluate the second-order dependencies in particular (Figure 6), where there are large error bars and the current analysis uses a threshold of p < .05 across a large number of tests hence creating a high false-discovery risk.

    4. In the key analyses in Figure 4, we see model predictions averaged across participants. This can be misleading, as the average of many models can produce behavior outside the class of functions the models themselves can generate. It would be helpful to see the distribution of raw model predictions (ideally compared against individual data from humans). Minimally, showing predictions from representative models in each class would provide insight into where specific models are getting things right and wrong, which is not apparent from the model comparison.

  5. Reviewer #3 (Public Review):

    This manuscript offers a novel account of history biases in perceptual decisions in terms of bounded rationality, more specifically in terms of finite resources strategy. Bridging two works of literature on the suboptimalities of human decision-making (cognitive biases and bounded rationality) is very valuable per se; the theoretical framework is well derived, building upon the authors' previous work; and the choice of experiment and analysis to test their hypothesis is adequate. However, I do have important concerns regarding the work that do not enable me to fully grasp the impact of the work. Most importantly, I am not sure whether the hypothesis whereby inference is biased towards avoiding high precision posterior is equivalent or not to the standard hypothesis that inference "leaks" across time due to the belief that the environment is not stationary. This and other important issues are detailed below. I also think that the clarity and architecture of the manuscript could be greatly improved.

    1. At this point it remains unclear what is the relationship between the finite resources hypothesis (the only bounded rationality hypothesis supported by the data) and more standard accounts of historical effects in terms of adaptation to a (believed to be) changing environment. The Discussion suggests that the two approaches are similar (if not identical) at the algorithmic level: in one case, the posterior belief is stretched (compared to the Bayesian observer for stationary environments) due to precision cost, in other because of possible changes in the environment. Are the two formalisms equivalent? Or could the two accounts provide dissociable predictions for a different task? In other words, if the finite resources hypothesis is not meant to be taken as brain circuits explicitly minimizing the cost (as stated by the authors), and if it produces the same type of behavior as more classical accounts: is the hypothesis testable experimentally?

    2. The current analysis of history effects may be confounded by effects of the motor responses (independently from the correct response), e.g. a tendency to repeat motor responses instead of (or on top of) tracking the distribution of stimuli.

    3. The authors assume that subjects should reach their asymptotic behavior after passively viewing the first 200 trials but this should be assessed in the data rather than hypothesized. Especially since the subjects are passively looking during the first part of the block, they may well pay very little attention to the statistics.

    4. The experiment methods are described quite poorly: when is the feedback provided? What is the horizontal bar at the bottom of the display? What happens in the analysis with timeout trials and what percentage of trials do they represent? Most importantly, what were the subjects told about the structure of the task? Are they told that probabilities change over blocks but are maintained constant within each block?