Efficient decision-makers evaluate relative reward per effort

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The paper describes an interesting, but very abstract extension of normative choice theories. By linking economic and foraging theory, the paper would potentially be of interest to a broad audience in behavioral economics and neuroscience. However, the results in their current form have several important limitations: the lack of a significant validation, such as an account for well-known behavioral or neural effects that would not be explained by alternative theories, a quantitative performance comparison between the proposed EDM and other models in realistic behavioral situations, and a specific link between the actual processes and limitations of real brains and the EDM.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Understanding how humans and animals can make effective decisions would have a profound impact on economics, psychology, ecology, and related fields. Neoclassical economics provides a formalism for optimal decisions, but the apparatus requires a large number of evaluations of the decision options as well as representations and computations that are biologically implausible. This article shows that natural constraints distill the economic optimization into an efficient and biologically plausible decision strategy. In this strategy, decision-makers evaluate the relative reward across their options and allocate their effort proportionally, thus equalizing the reward per effort across their options. Using a combination of analytical and simulation approaches, the article shows that this strategy is efficient, providing optimal or near-optimal gain following a single evaluation of decision options. The strategy is also rational; satisficing and indifferent decision-makers are found to perform relatively poorly. Moreover, the relativistic value functions underlying this efficient strategy provide an account of time discounting, effort discounting, and resource discounting in general.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    In this paper, Jan Kubanek attempts to derive an 'effective decision strategy' that is optimal (and therefore normative) given certain constraints resulting from computational capacity limitations. The author first points out that neoclassical economics (i.e., expected utility theory, EUT) provides normative predictions for decisions to maximize utility. Next, he (correctly) points out that finding the optimal solutions to decision problems requires computational resources that are unlikely to exist in actually existing decision-makers (animals and humans). He claims that this fact is the most severe problem for concluding that EUT is an accurate description of actual human or animal decision processes. I disagree with him on this point as I will lay out in more detail below. Next, the author attempts to find an 'efficient' (i.e., computationally reasonable) decision strategy that comes close to the original normative framework. He claims that such a strategy is EDM, whereby decisions are made by allocating relative effort in proportion to the relative reward of each option.

    Overall, I find this paper hard to judge. The considerations described in this paper are certainly interesting and I have no reason to presume that the mathematical derivations described are wrong (without having made an effort to follow and check it in detail). Still, I find the paper, in the end, sterile and I fear it will have only limited impact. I think the manuscript should be expanded in three different directions to make it more relevant for the neuroscientific understanding of decision making.

    First, the author needs to show that EDM can also explain other known violations of EUT related to the axiom of regularity (i.e., preferences between two options should not be affected by the presence of inferior options). This seems relevant because these behavioral effects robustly violate the choice allocation strategy of EDM.

    Second, EDM is so abstract that the actual structure and capacity of the nervous system are nearly irrelevant. The author should consider more deeply the computational requirements and capacities of different types of brains; fruit flies, frogs, and primates, and the consequences of these differences for what is (or should be) achievable in terms of optimal behavior.

    Third, the paper contains no test for EDM. This is in part because EDM is at no point compared to the predictions of alternative theories.

    I thank the Reviewer for these constructive comments, which are addressed below.

    My specific concerns are as follows:

    (1) The author claims that the most severe problem of EUT is that it is computationally implausible. However, I disagree.

    It could be claimed that EUT describes an (unattainable) optimal state that actual brains try to accomplish with limited resources. (In essence, the current paper follows this strategy).

    Correct, EDM stems from Expected Utility Theory subjected to specific biological considerations, as shown in Figure 1.

    Given this origin, the paper now makes more appropriate statements regarding the biologically-relevant shortcomings of EUT:

    i) Abstract: "the apparatus requires a large number of evaluations of the decision options as well as neural representations and computations that are not biologically plausible."—>"the apparatus requires a large number of evaluations of the decision options as well as neural representations and computations that are difficult to implement at the biological level" ii) Introduction: "To address these biologically implausible requirements, …" —> "To address the biological constraints, …").

    I think the situation is much direr. During the last 70 years, a small army of psychologists and behavioral economists have described a large number of violations of EUT's normative predictions: the Allais paradox, framing effects, the behavioral tendencies summarized in Prospect theory, and others. These differences between behavior and normative predictions are important because they violate basic assumptions of the normative theory.

    Prospect theory can be readily incorporated into EDM.

    This has resulted in the following paragraph in the Discussion:

    "Notably, the 𝑢𝑖 and 𝑒𝑖 variables can incorporate additional factors such as the probability of an outcome, as in prospect theory (Kahneman and Tversky, 1979). A previous study (Kubanek, 2017) demonstrates that prospect theory’s incorporation of probabilities into utilities does not change the relationship between the differential formulation of Equation 1 and the fractional formulation of Equation 2, which is crucial for EDM. Moreover, the 𝑢𝑖 and 𝑒𝑖 variables can be entirely subjective. So long as the representations are comparable by the brain (e.g., through relative firing rates; Figure 6), the 𝑒𝑖 = 𝑢𝑖 strategy provides an efficient allocation of the decision-maker’s resources."

    (2) The most interesting case of such violations is a set of well-known behavioral effects that occur in the context of multi alternative-multi attribute decision making. They are known as the attraction, similarity, and compromise effects (there is a large literature; more recently: Dumbalska T, Li V, Tsetsos K, Summerfield C. A map of decoy influence in human multi alternative choice. Proc Natl Acad Sci U S A. 2020 Oct 6;117(40):25169-25178. doi: 10.1073/pnas.2005058117. Epub 2020 Sep 21.) These biases have received so much attention because they violate a very basic axiom of EUT. Choices between two options should not be affected by the presence of a third option that is inferior to both of them. However, that is exactly what happens in these choice biases. The effects have been shown in many species ranging from humans to amphibians to invertebrates. As far as I can see, EDM cannot explain how choice allocation between two options A and B that have equal value would be changed by the inclusion of a new option D so that is of lower value than A or B in such a way that D is not chosen at all, but A is chosen more often than B if D is similar in attributes to A (the 'attraction' effect). If I am mistaken, the inclusion of an explanation of how this would work would be of major importance.

    The new Figure 6 provides a starting point for addressing these effects.

    Specifically, this comment has resulted in the following Discussion paragraph:

    "In EDM, the relativistic representation of utility at the neural level (black bars in Figure 6) involves divisive normalization. Divisive normalization a common operation performed by neural circuits (Carandini and Heeger, 2012). The specific form of this operation may be crucial for explaining attraction, similarity, and compromise effects observed in multi-alternative, multi-attribute decision environments (Noguchi and Stewart, 2014; Dumbalska et al., 2020). For instance, it has been found that a transformation of utilities by specific monotonic functions prior to divisive normalization can explain these behavioral effects parsimoniously (Dumbalska et al., 2020). On this front, monotonic transformations and divisive normalization are performed by several kinds of feedforward and feedback neural circuits (Lek et al., 1996; Carandini and Heeger, 2012). Nonetheless, how exactly individual attributes of decision options are encoded at the neural level should be investigated using large-scale neuronal recordings."

    (3) EDM as described in this manuscript is completely static, that is it ignores actual computational processes that underlie decision making. This is in opposition to an important modern branch of decision research that has stressed the importance of understanding processes (and their limitations) to understand how choices are made. Examples are: (1) Roe RM, Busemeyer JR, Townsend JT. Multialternative decision field theory: a dynamic connectionist model of decision making. Psychol Rev. 2001 Apr;108(2):370-92. doi: 10.1037/0033-295x.108.2.370. PMID: 11381834.; (2) Tsetsos K, Usher M, Chater N. Preference reversal in multiattribute choice. Psychol Rev. 2010 Oct;117(4):1275-93. doi: 10.1037/a0020580. PMID: 21038979. The relationship between EDM and algorithmic implementations should be explored.

    This point has been addressed in the following ways:

    1. EDM is now implemented at the algorithmic level while positioned within stochastic choice environments.

    2. The performance of EDM in the stochastic environments is reported in a new Figure 4.

    3. The performance of EDM within the stochastic and deterministic environments is now compared in a new Figure 5. The figure shows that both environments support the same principal conclusions.

    4. Figure 4b provides mechanistic examples of the individual effort allocations by EDM and alternative strategies.

    5. The Discussion includes a new paragraph that places EDM within a broader context of algorithmic implementations:

    "In deterministic environments, EDM comprises a single stage that embodies Equation 7. This rule is analogous to the evolutionarily stable “relative reward sum” in ecology (Harley, 1981; Hamblin and Giraldeau, 2009) and “local fractional income” in neuroscience (Sugrue et al., 2004). In dynamic and stochastic environments, the strategy should additionally incorporate an integration stage that mitigates the effect noise and thus provides meaningful estimates of the worth of each option. Several approaches can be used to keep track of dynamic, stochastic environments and thus estimate their relative worth 𝑢𝑖. The most compact are related to reinforcement learning, in which previous payoffs are discounted exponentially using a “learning rate.” This approach has been applied in ecology (Harley, 1981; Hamblin and Giraldeau, 2009), computer science (Sutton and Barto, 1998), neuroscience (Sugrue et al., 2004; Corrado et al., 2005), and was also applied here when assessing performance in stochastic environments. One benefit of this free parameter is that decision-makers can adapt the learning rate to the speed of change or the level of stochasticity of specific decision situations (Iigaya et al., 2019)."

    (4) Most importantly, what is missing is a clear prediction for a finding (behavioral or neuronal) that would only be predicted, but not by any other theory of decision making. Without such a proposed test, the idea has no scientific merit.

    The paper includes new analyses and text that provide predictions that are specific to EDM. Specifically, this point has been addressed in three ways:

    1. The three predictions that are specific to EDM are now made explicit in a new Figure 5. The figure also provides quantitative support of EDM through performance evaluations across these predictions.

    2. The Results include the following text regarding the key defining properties of EDM: "Figure 5 summarizes and expands on the defining properties of EDM. First, the main finding of this article is that EDM is characterized by high performance following a single evaluation of decision options (Figure 5a). Second, Figure 3a suggested that the proportional allocation of effort to relative utilities (𝛽 → 1) may represent an optimum, at least across the space of effort-utility contingencies tested. Figure 5b-top additionally evaluates the impact of this exponent in the stochastic choice situations. This figure replicates the findings of Figure 3a in that 𝛽 = 1 lies near the optimum, with 𝛽 = 1.0 and 𝛽 = 1.2 providing an average gain of 94.0% and 94.1%, respectively. Thus, the proportional allocation of effort to relative utilities is another defining trait of EDM, and this strategy provides near-optimal performance in all decision situations tested. And third, the effort allocation strategy in EDM, 𝑒harvest = 𝑢(𝑒eval), is invoked once regardless of the number of decision options. This is in contrast to optimization, whose convergence time scales with problem dimensionality, i.e., the number of options. The single-evaluation EDM strategy maintains performance across the number of options under the VI schedules (Figure 5c-top; slope 0.67% per option, 𝐹 = 4.46, 𝑝 = 0.073), although it does incur a performance loss (Figure 5c-bottom; slope -0.95% per option, 𝐹 = 34.66, 𝑝 = 0.00061) in the deterministic cases. Notably, to attain the performance of EDM, the theoretical maximizing agents required substantially more evaluations in situations involving a large number of options (Figure 5c gray; top: slope 2.8 evaluations per option, 𝐹 = 21.80, 𝑝 = 0.0023; bottom: slope 3.4 evaluations per option, 𝐹 = 250.2, 𝑝 = 9.7 × 10−7)."

    3. The Discussion now includes a dedicated paragraph on the testability of the EDM predictions at the behavioral and neural levels:

    "EDM is testable at the behavioral (Figure 5) and neural (Figure 6) levels. At the behavioral level, EDM possesses three distinctive characteristics. First, EDM obtains high reward rapidly (Figure 5a). This characteristic can be tested in choice environments that minimize noise as an additional factor (e.g., Figure 7a), providing performance versus evaluations plots analogous to Figure 2. Second, EDM allocates relative effort to relative utilities proportionally (Figure 5b). This characteristic can be tested in situations in which utilities can be measured precisely, e.g., through the volume of fluid rewards in animal experiments or money in human experiments. And third, EDM allocates effort rapidly regardless of the number of options. This is because the 𝑒harvest = 𝑢 (𝑒eval) strategy is agnostic to the number of options. This characteristic can be tested by varying the number of decision options and quantifying the number of times a decision-maker evaluates the options. At the neural level, EDM only requires the encoding of relative utilities of the recently sampled options. This relative code can be implemented using firing rates of the neuronal pools representing each alternative (Figure 6 top row, middle column). Indeed, this representation has been found in the primate brain. Specifically, the relative value associated with EDM, termed “fractional income,” captures firing rates of neurons in monkey area LIP (Sugrue et al., 2004)."

    Reviewer #2 (Public Review):

    In this article, Kubanek shows how simple, local decision strategies approximate optimal foraging behavior using analytical methods and model simulations. To ground the argument beyond model simulations, Kubanek generalizes previous theoretical frameworks in economics and foraging to show how evaluating relative utility and effort are sufficient to find optimal behavior. A particular strength of this study is its principled approach to linking general economic theory with foraging theory and deriving the conditions under which local behavioral strategies provide general and efficient means to the optimality problem. Re-casting utility and effort in relative terms offers attractive possibilities to apply these formulations in describing a range of phenomena. I, therefore, believe this short report will be of interest to a multidisciplinary audience from economics, psychology behavior theory, foraging, and neuroscience. The author's main claims are supported by their evidence.

    Potential weaknesses of the study include:

    1. Predictions from the proposed EDM framework are stated in vague terms and could be formulated more concretely and, if possible, included in the model simulations.

    The predictions are now stated explicitly in a new Figure 5 and the associated text, and supported through performance evaluations within deterministic and stochastic choice environments.

    Moreover, a new Figure 6 provides a representational and computational account of EDM, and summarizes the main point of the paper that EDM combines high performance with simple, biologically plausible evaluation.

    1. The specificity of the EDM model and related model is only briefly touched on. The EDM argument could be strengthened by making the relation to other behavioral models more explicit.

    The new Figure 5 now compares the key characteristics of EDM with a set of related and more complex models, and evaluates their performance across these characteristics. The relations to other behavioral models are further specified in two new Discussion paragraphs.

    1. Many behavioral situations, including the in this paper often-cited study by Sugrue et al (2004), involve reward contingencies with a high level of uncertainty and non-stationary environments. While the author mentions these situations at the end of the discussion, it remains vague how EDM precisely performs or relates to decision strategies that deal with such environments.

    The article now includes also stochastic choice environments, in addition to the original deterministic choice environments. This has resulted in new Figure 4, Figure 5, and the associated text.

    The results in the stochastic environment corroborate those obtained in the deterministic environments.

  2. Evaluation Summary:

    The paper describes an interesting, but very abstract extension of normative choice theories. By linking economic and foraging theory, the paper would potentially be of interest to a broad audience in behavioral economics and neuroscience. However, the results in their current form have several important limitations: the lack of a significant validation, such as an account for well-known behavioral or neural effects that would not be explained by alternative theories, a quantitative performance comparison between the proposed EDM and other models in realistic behavioral situations, and a specific link between the actual processes and limitations of real brains and the EDM.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    In this paper, Jan Kubanek attempts to derive an 'effective decision strategy' that is optimal (and therefore normative) given certain constraints resulting from computational capacity limitations. The author first points out that neoclassical economics (i.e., expected utility theory, EUT) provides normative predictions for decisions to maximize utility. Next, he (correctly) points out that finding the optimal solutions to decision problems requires computational resources that are unlikely to exist in actually existing decision-makers (animals and humans). He claims that this fact is the most severe problem for concluding that EUT is an accurate description of actual human or animal decision processes. I disagree with him on this point as I will lay out in more detail below. Next, the author attempts to find an 'efficient' (i.e., computationally reasonable) decision strategy that comes close to the original normative framework. He claims that such a strategy is EDM, whereby decisions are made by allocating relative effort in proportion to the relative reward of each option.

    Overall, I find this paper hard to judge. The considerations described in this paper are certainly interesting and I have no reason to presume that the mathematical derivations described are wrong (without having made an effort to follow and check it in detail). Still, I find the paper, in the end, sterile and I fear it will have only limited impact. I think the manuscript should be expanded in three different directions to make it more relevant for the neuroscientific understanding of decision making. First, the author needs to show that EDM can also explain other known violations of EUT related to the axiom of regularity (i.e., preferences between two options should not be affected by the presence of inferior options). This seems relevant because these behavioral effects robustly violate the choice allocation strategy of EDM. Second, EDM is so abstract that the actual structure and capacity of the nervous system are nearly irrelevant. The author should consider more deeply the computational requirements and capacities of different types of brains; fruit flies, frogs, and primates, and the consequences of these differences for what is (or should be) achievable in terms of optimal behavior. Third, the paper contains no test for EDM. This is in part because EDM is at no point compared to the predictions of alternative theories.

    My specific concerns are as follows:

    (1) The author claims that the most severe problem of EUT is that it is computationally implausible. However, I disagree. It could be claimed that EUT describes an (unattainable) optimal state that actual brains try to accomplish with limited resources. (In essence, the current paper follows this strategy). I think the situation is much direr. During the last 70 years, a small army of psychologists and behavioral economists have described a large number of violations of EUT's normative predictions: the Allais paradox, framing effects, the behavioral tendencies summarized in Prospect theory, and others. These differences between behavior and normative predictions are important because they violate basic assumptions of the normative theory.

    (2) The most interesting case of such violations is a set of well-known behavioral effects that occur in the context of multi alternative-multi attribute decision making. They are known as the attraction, similarity, and compromise effects (there is a large literature; more recently: Dumbalska T, Li V, Tsetsos K, Summerfield C. A map of decoy influence in human multi alternative choice. Proc Natl Acad Sci U S A. 2020 Oct 6;117(40):25169-25178. doi: 10.1073/pnas.2005058117. Epub 2020 Sep 21.) These biases have received so much attention because they violate a very basic axiom of EUT. Choices between two options should not be affected by the presence of a third option that is inferior to both of them. However, that is exactly what happens in these choice biases. The effects have been shown in many species ranging from humans to amphibians to invertebrates. As far as I can see, EDM cannot explain how choice allocation between two options A and B that have equal value would be changed by the inclusion of a new option D so that is of lower value than A or B in such a way that D is not chosen at all, but A is chosen more often than B if D is similar in attributes to A (the 'attraction' effect). If I am mistaken, the inclusion of an explanation of how this would work would be of major importance.

    (3) EDM as described in this manuscript is completely static, that is it ignores actual computational processes that underlie decision making. This is in opposition to an important modern branch of decision research that has stressed the importance of understanding processes (and their limitations) to understand how choices are made. Examples are: (1) Roe RM, Busemeyer JR, Townsend JT. Multialternative decision field theory: a dynamic connectionist model of decision making. Psychol Rev. 2001 Apr;108(2):370-92. doi: 10.1037/0033-295x.108.2.370. PMID: 11381834.; (2) Tsetsos K, Usher M, Chater N. Preference reversal in multiattribute choice. Psychol Rev. 2010 Oct;117(4):1275-93. doi: 10.1037/a0020580. PMID: 21038979. The relationship between EDM and algorithmic implementations should be explored.

    (4) Most importantly, what is missing is a clear prediction for a finding (behavioral or neuronal) that would only be predicted, but not by any other theory of decision making. Without such a proposed test, the idea has no scientific merit.

  4. Reviewer #2 (Public Review):

    In this article, Kubanek shows how simple, local decision strategies approximate optimal foraging behavior using analytical methods and model simulations. To ground the argument beyond model simulations, Kubanek generalizes previous theoretical frameworks in economics and foraging to show how evaluating relative utility and effort are sufficient to find optimal behavior. A particular strength of this study is its principled approach to linking general economic theory with foraging theory and deriving the conditions under which local behavioral strategies provide general and efficient means to the optimality problem. Re-casting utility and effort in relative terms offers attractive possibilities to apply these formulations in describing a range of phenomena. I, therefore, believe this short report will be of interest to a multidisciplinary audience from economics, psychology behavior theory, foraging, and neuroscience. The author's main claims are supported by their evidence.

    Potential weaknesses of the study include:

    1. Predictions from the proposed EDM framework are stated in vague terms and could be formulated more concretely and, if possible, included in the model simulations.

    2. The specificity of the EDM model and related model is only briefly touched on. The EDM argument could be strengthened by making the relation to other behavioral models more explicit.

    3. Many behavioral situations, including the in this paper often-cited study by Sugrue et al (2004), involve reward contingencies with a high level of uncertainty and non-stationary environments. While the author mentions these situations at the end of the discussion, it remains vague how EDM precisely performs or relates to decision strategies that deal with such environments.