Human decision-makers terminate evidence accumulation using flexible decision rules

Abstract

Decisions based on evidence accumulated over time require rules governing when to end the accumulation process and commit to a choice. These rules control inherent trade-offs between decision speed and accuracy, which require careful balance to maximize quantities that depend on both like reward rate. We previously showed that, to maximize reward rate, normative decision rules adapt to changing task conditions (Barendregt et al., 2022). Here we used a novel task to examine whether and how people use adaptive rules for individual decisions under a variety of conditions, including changes in decision outcomes across trials and changes in evidence quality both across and within trials. We found that the participants tended to use rules that adjusted, at least partially, to predictable changes in task conditions to improve reward rate, consistent with a rationally bounded implementation of normative principles. These findings help inform our understanding of the extent and limits of flexible decision formation in the brain.

Joint Public Review:

Summary:

Kalburge et al. investigate a task in which human subjects make a decision based on the accumulation of noisy evidence. Tasks like this have been studied for decades, but always with the same essential ingredient: noisy moment-by-moment evidence has to be integrated internally by the subjects, and so is not observed by the experimenter.

In this study, the authors depart from this scenario and make the evidence visible. Specifically, subjects see a pigeon moving stochastically on a screen, and they have to determine whether the net motion is to the right or to the left. This provides the experimenter direct access - on a trial-by-trial basis - to the bounds the subjects use to make their decision.

The authors apply this paradigm across a range of tasks, each one differing in how the signal-to-noise ratio (SNR; defined to be the ratio of the drift rate of the pigeons to the standard deviation of the noise) changes over time and across trials. The tasks range from the standard case of constant SNR to the non-standard case where the SNR changes abruptly in the middle of the task.

The authors determined, on a trial-by-trial basis, the bounds used by the subjects. Setting the bounds optimally when the SNR changes over time or across trials is a non-trivial problem; not surprisingly, then, the subjects were suboptimal. However, they weren't very suboptimal; instead, their behavior was "satisficing" (in the words of the authors), meaning their bounds were reasonably close to the optimal ones. Since the loss is relatively flat near the maximum, and finding the optimal bounds is hard, this is a sensible strategy.

Strengths:

The main strength of this work is the introduction of a new paradigm that supports a trial-by-trial measure of the decision bound. This allows direct measurement of the bound at decision time within individual trials. This, in turn, allows experimenters to determine whether the decision bound differs across decision time or fluctuates for the same decision time across trials. This is harder, although not impossible, to do with tasks in which decision bounds have to be estimated across multiple trials, especially when the SNR is changing.

The authors use this paradigm to show that the decision bounds are mostly constant when the SNR is constant within and across trials. This has been shown indirectly before by fitting models with different parametric boundary shapes, but not directly by measuring the boundary separately for different decision times (but see Kira, Yang, and Shadlen, 2015). They also demonstrate that variability in these bound estimates arises from measurement noise rather than trial-by-trial variability in bound heights, something that could not have been done with previous paradigms.

They furthermore replicate findings that subjects adjust their bounds, including weak collapse, to changing reward contingencies and SNRs, further validating their paradigm. And finally, the work demonstrates an apparent within-trial bound change if the SNR changes (predictably) mid-trial, as predicted by their previous work (Barendregt et al., 2022). This is -- to our knowledge -- the first confirmation of this prediction.

Weaknesses:

There are two non-technical weaknesses.

First, comparison to optimal behavior was mainly qualitative; a quantitative comparison would greatly strengthen the work.

Second (although not exactly a weakness), the work does not leverage the full potential of trial-by-trial estimates of the decision bound, which is a missed opportunity. To our understanding, the only finding that relied on trial-by-trial access to the bound was that the variability in the bound estimate was a major source of measurement noise. Their finding that the bound changes to reward contingencies and SNR, on the other hand, did not require such a trial-by-trial estimate. However, with this task (and not standard paradigms), the authors could determine how the bounds change during learning, which would give insight into the learning rules that participants use to adjust their bounds.

There are also a few technical issues.

(1) The authors argue that they don't observe a collapsing bound when the SNR varied across blocks (Figure 5). However, they only seem to perform this analysis on the difference in boundaries between trials with different SNRs (Figsures 5B, D). Observing a zero difference implies that the boundary shape is the same across SNRs, but does not rule out a collapse.

(2) The evidence for a within-trial boundary change for conditions with a within-trial SNR change could be stronger. The data shown in Figures 6C, D is very noisy, and there are no error bars. For individual participants, is the estimated change in bound larger than the variability in bound estimates before and after the SNR changepoint? Are there potentially other measures that could be used to make the point of a clear change in boundary within individual trials more convincing?

(3) The work assumes that bound height estimates are biased due to the bounded accumulation nature of the decision process, and it corrects for these biases with a simulation-based correction (Methods and Figure 7). To our understanding, this correction assumes that the decision time is the first time that this boundary is crossed. However, the authors do not demonstrate that this is the strategy that participants use; they need to explicitly rule out the possibility that there are significant pigeon excursions across the boundary before the decision time.

(4) The authors did not consider other stopping rules, such as a decision based on the last few trials. Showing that a stopping rule based purely on the bound fits the data better than other possible rules would strengthen the manuscript.

Read the original source

Human decision-makers terminate evidence accumulation using flexible decision rules

Curated by eLife

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Neural Dynamics of Belief and Value Computations Guiding Strategic Social Decisions

Gambling for redemption: Consequential inconsistency in dynamic decision making

Decoding how optimism-pessimism bias dynamically shapes risk-taking behavior

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Neural Dynamics of Belief and Value Computations Guiding Strategic Social Decisions

Gambling for redemption: Consequential inconsistency in dynamic decision making

Decoding how optimism-pessimism bias dynamically shapes risk-taking behavior