Tracking subjects’ strategies in behavioural choice experiments at trial resolution

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    The authors introduce a potentially valuable novel method that provides trial-by-trial probabilistic estimates of learning and decision-making strategies inferred from choice behavior across species. This approach could prove more useful over traditional techniques for arbitrating between strategies and detecting when learning happens, and because it is computationally lightweight. Reviewers identified several concerns that limit the strength of the evidence provided, rendering the findings incomplete.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Investigating how, when, and what subjects learn during decision-making tasks requires tracking their choice strategies on a trial-by-trial basis. Here, we present a simple but effective probabilistic approach to tracking choice strategies at trial resolution using Bayesian evidence accumulation. We show this approach identifies both successful learning and the exploratory strategies used in decision tasks performed by humans, non-human primates, rats, and synthetic agents. Both when subjects learn and when rules change the exploratory strategies of win-stay and lose-shift, often considered complementary, are consistently used independently. Indeed, we find the use of lose-shift is strong evidence that subjects have latently learnt the salient features of a new rewarded rule. Our approach can be extended to any discrete choice strategy, and its low computational cost is ideally suited for real-time analysis and closed-loop control.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    This article proposes a new statistical approach to identify which of several experimenter-defined strategies best describes a biological agent's decisions when such strategies are not fully observable by choices made in a given trial. The statistical approach is described as Bayesian but can be understood instead as computing a smoothed running average (with decay) of the strategies' success at matching choices, with a winner-take-all inference across the rules. The article tests the validity of this statistical approach by applying it to both simulated agents and real data sets in mice and humans. It focuses on dynamically changing environments, where the strategy best describing a biological agent may change rapidly.

    The paper asks an important question, and the analysis is well conducted; the paper is well-written and easy to follow. However, there are several concerns that limit the strength of the contribution. Major concerns include the framing of the method, considerations around the strategy space, limitations in how useful the technique may be, and missing details in analyses.

    Reviewer #2 (Public Review):

    In this study, the goal is to leverage the power of Bayesian inference to estimate online the probability that any given arbitrarily chosen strategy is being used by the decision-maker. By computing the trial-by-trial MAP and variance of the posterior distribution for each candidate strategy, the authors can not only see which strategy is primarily being used at every given time during the task and when strategy changes occur but also detect when the target rule of a learning task becomes the front-running strategy, i.e., when successful learning occurs.

    Strengths:

    1. The proposed approach adds to recent methods for capturing the dynamics of decision-making at finer temporal resolution (trials) (Roy et al., 2021; Ashwood et al., 2022) but it is novel and differs from these in that it is suited especially well for analyzing when learning occurs, or when a rule switches and learning must recommence, and it does not necessitate large numbers of trials.
    1. The manuscript starts with a validation of the approach using synthetic data and then is applied to datasets of trial-based two-alternative forced choice tasks ranging from rodent to non-human primate to human, providing solid evidence of its utility.
    1. Compared to classic procedures for identifying when an animal has learned a contingency which typically needs to be conservative in favor of better accuracy, this method retrieves signs of learning happening earlier (~30 trials earlier on average). This is achieved by identifying the moment (trial) when the posterior probability of the correct "target" rule surpasses the probability of all other strategies. Having greater temporal precision in detecting when learning happens may have a very significant impact on studies of the neural mechanisms of learning.
    1. This approach seems amenable to testing many different strategies depending on the purpose of the analysis. In the manuscript, the authors test target versus non-target strategies (correct versus incorrect) and also in another version of the analysis, they test what they call "exploratory" strategies.
    1. One of the main appeals of this method is its apparent computational simplicity. It necessitates only updating on every trial the parameters of a beta distribution (prior distribution for a given strategy) with the evidence that the behavior on trial was either consistent or inconsistent with the strategy. Two scalars, the mode of the posterior (MAP) and the inverse of the variance, are all that are required for identifying the decision criterion (highest MAP and if tied lowest variance) and the learning criterion (first trial where MAP for target strategy is higher than chance).

    Weaknesses:

    1. It seems like a limitation of this approach is that the candidate strategies to arbitrate between must be known ex-ante. It is not clear how this approach could be applied to uncover latent strategies that are not mixtures of the strategies selected.
    1. Different strategies may be indistinguishable from each other and thus it may not be possible to distinguish between them. Similarly, the fact that two strategies seem to be competing for the highest MAP doesn't necessarily mean that those are correct strategies and perhaps interchangeable as the manuscript seems to suggest.
    1. The decay parameter is a necessary component to make the strategy selection non-stationary and accommodate data sets where the rules are changing throughout the task. However, the choice of the decay parameter value bounds does not seem very principled. Having this parameter as a free-parameter adds a flexibility that seems to have significant effects on when the strategy switch is detected and how stable the detected switch is.
    1. This method is a useful approach for arbitrating between strategies and describing the behavior with a temporal precision that may prove important for studies attempting to tie these precise events to changes in neural activity. However, it seems limited in its explanatory power. In its current form, this method does not provide a prediction of the probability to transition from one strategy to another. And, because the MAP of different strategies may be close at any given moment, it is hard to imagine using this approach to tease out the different "mental states" that represent each strategy being at play.

    The reviewers’ detailed comments, not shared here, helped us considerably to improve the paper, and we thank the reviewers for their time here. We are unsure of the merits of sharing public reviews of a paper that has now changed considerably from the version that these reviews address. Nonetheless we shall address some key points of potential misunderstanding here.

    “The statistical approach is described as Bayesian but can be understood instead as computing a smoothed running average (with decay) of the strategies' success at matching choices, with a winner-take-all inference across the rules.“

    This is inaccurate. The algorithm performs sequential Bayesian updates on the evidence for and against the use of each strategy considered; for a given strategy i, its output at each trial is a fully parameterised posterior distribution over the probability of that strategy being used by the subject.

    We are careful in the paper to separate the algorithm’s output from our further use of that output. To plot and analyse the output we often make use of the maximum a posteriori (MAP) estimate from each posterior. Other choices are of course possible, and we discuss them in the text.
    In one set of simulations we quantify the results using a decision rule that chooses the strategy with the highest MAP - this is presumably the “winner-takes-all inference” in the quoted text. We do not use this anywhere else in the paper, including the analyses of the 4 datasets, and so do not consider it as part of our method, but one possible use of the output of the algorithm.

    “Major concerns include the framing of the method, considerations around the strategy space, limitations in how useful the technique may be, and missing details in analyses”

    Our goal for this paper was to develop a computationally lightweight, trial-resolution, Bayesian approach to tracking the probability of user-specified strategies, so that we can capture the observer’s evidence for learning or for the features driving exploratory choice (e.g. whether subjects are responding to losses or wins; are they responding to cues or choice etc). The above quote reflects their detailed review comments, where we felt this reviewer wanted a solution to a different problem, that of a parameterised latent model of strategy use: while a perfectly valid research goal, this was not what we addressed here.

    “1) It seems like a limitation of this approach is that the candidate strategies to arbitrate between must be known ex-ante. It is not clear how this approach could be applied to uncover latent strategies that are not mixtures of the strategies selected.”

    The problem of knowing which strategies to analyse in advance only applies when running our algorithm in real-time. The fact that it could be run in real-time on modest computing hardware is to us one of its strengths, so we consider this a good problem to have.

    As noted above, rather than determine latent strategies, our goal was to build an observer model that allows users to specify whatever strategy they wanted in order to answer their scientific question(s) of their data. For example, to define when a particular rule has been learnt; or to look for changes in response to particular features of the environment, such as a cue, or to a drug treatment or other intervention.

    1. Different strategies may be indistinguishable from each other and thus it may not be possible to distinguish between them. Similarly, the fact that two strategies seem to be competing for the highest MAP doesn't necessarily mean that those are correct strategies and perhaps interchangeable as the manuscript seems to suggest.

    As noted above, this is an observer model, and it is thus necessarily true that there are strategies for which the observer does not have sufficient evidence to distinguish. For example, a subject who continually chooses the rewarded left-hand lever will be doing both a strategy of “go left” and of “win-stay” in response to their choice. The inability to distinguish strategies is a property of the data, not of the algorithm. Also as noted above, we do not here consider the competition between strategies.

    1. The decay parameter is a necessary component to make the strategy selection non-stationary and accommodate data sets where the rules are changing throughout the task. However, the choice of the decay parameter value bounds does not seem very principled. Having this parameter as a free-parameter adds a flexibility that seems to have significant effects on when the strategy switch is detected and how stable the detected switch is.

    The revised manuscript draws together the existing simulations and analysis of the method to directly address this point, showing that there is a principled range of the decay parameter in which the algorithm should operate. The Discussion also points out that this is no different to a free parameter than any frequentist approach to strategy analysis, which must choose some time windows over which to compute the frequentist probability.

    1. This method is a useful approach for arbitrating between strategies and describing the behavior with a temporal precision that may prove important for studies attempting to tie these precise events to changes in neural activity. However, it seems limited in its explanatory power. In its current form, this method does not provide a prediction of the probability to transition from one strategy to another. And, because the MAP of different strategies may be close at any given moment, it is hard to imagine using this approach to tease out the different "mental states" that represent each strategy being at play.

    As noted above, this is an observer model and does not intend to infer mental states. The goal is to make accurate statements about observable behaviour. We agree that an interesting extension to this approach would be to model the transitions between strategies, and had already outlined this in the Discussion.

  2. eLife assessment

    The authors introduce a potentially valuable novel method that provides trial-by-trial probabilistic estimates of learning and decision-making strategies inferred from choice behavior across species. This approach could prove more useful over traditional techniques for arbitrating between strategies and detecting when learning happens, and because it is computationally lightweight. Reviewers identified several concerns that limit the strength of the evidence provided, rendering the findings incomplete.

  3. Reviewer #1 (Public Review):

    This article proposes a new statistical approach to identify which of several experimenter-defined strategies best describes a biological agent's decisions when such strategies are not fully observable by choices made in a given trial. The statistical approach is described as Bayesian but can be understood instead as computing a smoothed running average (with decay) of the strategies' success at matching choices, with a winner-take-all inference across the rules. The article tests the validity of this statistical approach by applying it to both simulated agents and real data sets in mice and humans. It focuses on dynamically changing environments, where the strategy best describing a biological agent may change rapidly.

    The paper asks an important question, and the analysis is well conducted; the paper is well-written and easy to follow. However, there are several concerns that limit the strength of the contribution. Major concerns include the framing of the method, considerations around the strategy space, limitations in how useful the technique may be, and missing details in analyses.

  4. Reviewer #2 (Public Review):

    In this study, the goal is to leverage the power of Bayesian inference to estimate online the probability that any given arbitrarily chosen strategy is being used by the decision-maker. By computing the trial-by-trial MAP and variance of the posterior distribution for each candidate strategy, the authors can not only see which strategy is primarily being used at every given time during the task and when strategy changes occur but also detect when the target rule of a learning task becomes the front-running strategy, i.e., when successful learning occurs.

    Strengths:
    1. The proposed approach adds to recent methods for capturing the dynamics of decision-making at finer temporal resolution (trials) (Roy et al., 2021; Ashwood et al., 2022) but it is novel and differs from these in that it is suited especially well for analyzing when learning occurs, or when a rule switches and learning must recommence, and it does not necessitate large numbers of trials.

    2. The manuscript starts with a validation of the approach using synthetic data and then is applied to datasets of trial-based two-alternative forced choice tasks ranging from rodent to non-human primate to human, providing solid evidence of its utility.

    3. Compared to classic procedures for identifying when an animal has learned a contingency which typically needs to be conservative in favor of better accuracy, this method retrieves signs of learning happening earlier (~30 trials earlier on average). This is achieved by identifying the moment (trial) when the posterior probability of the correct "target" rule surpasses the probability of all other strategies. Having greater temporal precision in detecting when learning happens may have a very significant impact on studies of the neural mechanisms of learning.

    4. This approach seems amenable to testing many different strategies depending on the purpose of the analysis. In the manuscript, the authors test target versus non-target strategies (correct versus incorrect) and also in another version of the analysis, they test what they call "exploratory" strategies.

    5. One of the main appeals of this method is its apparent computational simplicity. It necessitates only updating on every trial the parameters of a beta distribution (prior distribution for a given strategy) with the evidence that the behavior on trial was either consistent or inconsistent with the strategy. Two scalars, the mode of the posterior (MAP) and the inverse of the variance, are all that are required for identifying the decision criterion (highest MAP and if tied lowest variance) and the learning criterion (first trial where MAP for target strategy is higher than chance).

    Weaknesses:
    1. It seems like a limitation of this approach is that the candidate strategies to arbitrate between must be known ex-ante. It is not clear how this approach could be applied to uncover latent strategies that are not mixtures of the strategies selected.

    2. Different strategies may be indistinguishable from each other and thus it may not be possible to distinguish between them. Similarly, the fact that two strategies seem to be competing for the highest MAP doesn't necessarily mean that those are correct strategies and perhaps interchangeable as the manuscript seems to suggest.

    3. The decay parameter is a necessary component to make the strategy selection non-stationary and accommodate data sets where the rules are changing throughout the task. However, the choice of the decay parameter value bounds does not seem very principled. Having this parameter as a free-parameter adds a flexibility that seems to have significant effects on when the strategy switch is detected and how stable the detected switch is.

    4. This method is a useful approach for arbitrating between strategies and describing the behavior with a temporal precision that may prove important for studies attempting to tie these precise events to changes in neural activity. However, it seems limited in its explanatory power. In its current form, this method does not provide a prediction of the probability to transition from one strategy to another. And, because the MAP of different strategies may be close at any given moment, it is hard to imagine using this approach to tease out the different "mental states" that represent each strategy being at play.