Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    This study used a multidimensional stimulus-response mapping task to determine how monkeys learn and update complex rules. The subjects had to use either the color or shape of a compound stimulus as the discriminative dimension that instructed them to select a target in different spatial locations on the task screen. Learning occurred across cued block shifts when an old mapping became irrelevant and a new rule had to be discovered. Because potential target locations associated with each rule were grouped into two sets that alternated, and only a subset of possible mapping between stimulus dimensions and response sets were used, the monkeys could discover information about the task structure to guide their block-by-block learning. By comparing behavioral models that assume incremental learning, quantified by Q-learning, Bayesian inference, or a combination, the authors show evidence for a hybrid strategy in which animals use inference to change among response sets (axes), and incremental learning to acquire new mappings within these sets.

    Overall, I think the study is thorough and compelling. The task is cleverly designed, the modeling is rigorous, and the manuscript is clear and well-written. Importantly there are large enough distinctions in the behavior generated by different models to make the authors' conclusions convincing. They make a strong case that animals can adopt mixed inference/updating strategies to solve a rule-based task. My only minor question is about the degree to which this result generalizes beyond the particulars of this task.

    Thanks for these kind comments. Regarding generalization, we agree with the reviewer and did not intend to make any claim about how the particular result generalizes beyond this task. Indeed, the specific result could depend on the training protocol even within the same task. We now discuss this explicitly in the manuscript, lines 800-810. However, we do take the view that even if the way the monkey’s behavior played out in this setting is a lucky accident, that may still reveal something fundamental about learning processes in the brain.

    Reviewer #2 (Public Review):

    The authors trained two monkeys to perform a task that involved sequential (blocked) but unsignalled rules for discriminating the colour and shape of visual stimulus, by responding with a saccade to one of four locations. In rules 1 and 3, the monkeys made shape (rule 1) or colour (rule 3) discriminations using the same response targets (upper left / lower right). In rule 2, the monkeys made colour judgments using a unique response axis (lower left/upper right). The authors report behaviour, with a focus on time to relearn the rules after an (unsignalled) switch for each rule, discrimination sensitivity for partially ambiguous stimuli, and the effect of congruency. They compare the ability of models based on Q-learning, Bayesian inference, and a hybrid to capture the results.

    The two major behavioural observations are (1) that monkeys re-learn faster following a switch to rule 2 (which occurs on 50% of blocks and involves a unique response axis), and (2) that monkeys are more sensitive to partially ambiguous stimuli when the response axis is unique, even for a matched feature (colour). These data are presented clearly and convincingly and, as far as I can tell, they are analysed appropriately. The former finding is not very surprising as rule 2 occurs most frequently and follows each instance of rule 1 or 3 (which is why the ideal observer model successfully predicts that the monkeys will switch by default to rule 2 following an error on rules 1 or 3) but it is nevertheless reassuring that this behaviour is observed in the animals. It additionally clearly confirms that monkeys track the latent state that denotes an uncued rule.

    The latter finding is more interesting and seems to have two potential explanations: (i) sensitivity is enhanced on rule 2 because it is occurs more frequently; (ii) sensitivity is enhanced on rule 2 because it has a unique response axis (and thus involves less resource sharing/conflict in the output pathway).

    The authors do not directly distinguish between these hypotheses per se but their modelling exercise shows that both results (and some additional constraints) can be captured by a hybrid model that combines Bayesian inference and Q learning, but not by models based on either principle alone. A Q-learning model fails to capture the latent state inference and/or the rule 2 advantage. The Bayesian inference model captures the rapid switches to rule 2 (which are more probable following errors on rule 1 and rule 3) but predicts matched discrimination performance for partially ambiguous stimuli on colour rules 2 and 3. This is because although knowing the most likely rule increases the probability of a correct response overall it does not increase discriminability and thus boosts the more ambiguous stimuli. I wondered whether it might be possible to explain this result with the addition of an attention-like mechanism that depends on the top-down inference about the rule. For example, greater certainty about the rule might increase the gain of discrimination (psychometric slope) in a more general way.

    We agree with the reviewer that our logic in ruling out pure inference models assumes that other factors affecting performance, like attention or motivation, are equivalent between blocks. In principle, if there were large and sustained differences in these factors between Rule 2 vs Rule 1 or 3 blocks, that might offer a different explanation for the effect. We now mention this caveat in the manuscript. In terms of actually leveraging this into a full account of the behavior, we are not quite sure how to instantiate the reviewer’s particular idea why this would be the case, however, since (as as we show in Fig. 3a,b,c, and Fig. S4a,b,c) the difference in psychometric slopes lasts at least 200 trials into the rule, even when (in the hybrid learning model) the feature weights have converged (Figure 4 – figure supplement 2). It’s hard to see why elevated uncertainty about the rule would persist this long in anything resembling an informed ideal observer model.

    The authors propose a hybrid model in which there is an implicit assumption that the response axis defines the rule. The model infers the latent state like an ideal observer but learns the stimulus-response mappings by trial and error. This means that the monkeys are obliged to constantly re-learn the response mappings along the shared response axis (for rules 1/3) but they remain fixed for rule 2 because it has a unique response axis. This model can capture the two major effects, and for free captures the relative performance on congruent and incongruent trials (those trials where the required action is the same, or different, for given stimuli across rules) on different blocks.

    I found the author's account to be plausible but it seemed like there might be other possible explanations for the findings. In particular, having read the paper I remained unclear as to whether it was the sharing of response axis per se that drove the cost on rule 3 relative to 2, or whether it was only because of the assumption that response axis = rule that was built into the authors' hybrid model. It would have been interesting to know, for example, whether a similar advantage for ambiguous stimuli on rule 2 occurred under circumstances where the rule blocks occured randomly and with equal frequency (i.e. where there was response axis sharing but no higher probability); or even whether, if the rule was explicitly signalled from trial to trial, the rule 2 advantage would persist in the absence of any latent state inference at all (this seems plausible; one pointer for theories of resource sharing is this recent review: https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(21)00148-0?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS1364661321001480%3Fshowall%3Dtrue). No doubt these questions are beyond the scope of the current project but nevertheless it felt to me that the authors' model remained a bit tentative for the moment.

    Thanks for these interesting thoughts. It is true that the imbalanced pattern of sharing (of response axes, and actually also features) across the three rules has important consequences for learning/inference under our model (and indeed other latent state inference models such as the informed ideal observer). It is an intriguing idea that these features of the design might cause interference even per se, for instance even without the need to do inference or learning because the rules are fully signaled. We agree this (and the other case the reviewer mentioned) is an interesting direction for future work. We have added this in the discussion, line 800-812.

    Was this evaluation helpful?
  2. eLife assessment

    This study modeled monkeys' behavior in a stimulus-response rule-learning task to show that animals can adopt mixed strategies involving inference for learning latent states and incremental updating for learning action-outcome associations. The task is cleverly designed, the modeling is rigorous, and importantly there are clear distinctions in the behavior generated by different models, which makes the authors' conclusions convincing. The study makes a strong contribution overall, however, there were aspects of the design that were unclear and some alternative accounts that were not considered.

    Was this evaluation helpful?
  3. Reviewer #1 (Public Review):

    This study used a multidimensional stimulus-response mapping task to determine how monkeys learn and update complex rules. The subjects had to use either the color or shape of a compound stimulus as the discriminative dimension that instructed them to select a target in different spatial locations on the task screen. Learning occurred across cued block shifts when an old mapping became irrelevant and a new rule had to be discovered. Because potential target locations associated with each rule were grouped into two sets that alternated, and only a subset of possible mapping between stimulus dimensions and response sets were used, the monkeys could discover information about the task structure to guide their block-by-block learning. By comparing behavioral models that assume incremental learning, quantified by Q-learning, Bayesian inference, or a combination, the authors show evidence for a hybrid strategy in which animals use inference to change among response sets (axes), and incremental learning to acquire new mappings within these sets.

    Overall, I think the study is thorough and compelling. The task is cleverly designed, the modeling is rigorous, and the manuscript is clear and well-written. Importantly there are large enough distinctions in the behavior generated by different models to make the authors' conclusions convincing. They make a strong case that animals can adopt mixed inference/updating strategies to solve a rule-based task. My only minor question is about the degree to which this result generalizes beyond the particulars of this task.

    Was this evaluation helpful?
  4. Reviewer #2 (Public Review):

    The authors trained two monkeys to perform a task that involved sequential (blocked) but unsignalled rules for discriminating the colour and shape of visual stimulus, by responding with a saccade to one of four locations. In rules 1 and 3, the monkeys made shape (rule 1) or colour (rule 3) discriminations using the same response targets (upper left / lower right). In rule 2, the monkeys made colour judgments using a unique response axis (lower left/upper right). The authors report behaviour, with a focus on time to relearn the rules after an (unsignalled) switch for each rule, discrimination sensitivity for partially ambiguous stimuli, and the effect of congruency. They compare the ability of models based on Q-learning, Bayesian inference, and a hybrid to capture the results.

    The two major behavioural observations are (1) that monkeys re-learn faster following a switch to rule 2 (which occurs on 50% of blocks and involves a unique response axis), and (2) that monkeys are more sensitive to partially ambiguous stimuli when the response axis is unique, even for a matched feature (colour). These data are presented clearly and convincingly and, as far as I can tell, they are analysed appropriately. The former finding is not very surprising as rule 2 occurs most frequently and follows each instance of rule 1 or 3 (which is why the ideal observer model successfully predicts that the monkeys will switch by default to rule 2 following an error on rules 1 or 3) but it is nevertheless reassuring that this behaviour is observed in the animals. It additionally clearly confirms that monkeys track the latent state that denotes an uncued rule.

    The latter finding is more interesting and seems to have two potential explanations: (i) sensitivity is enhanced on rule 2 because it is occurs more frequently; (ii) sensitivity is enhanced on rule 2 because it has a unique response axis (and thus involves less resource sharing/conflict in the output pathway).

    The authors do not directly distinguish between these hypotheses per se but their modelling exercise shows that both results (and some additional constraints) can be captured by a hybrid model that combines Bayesian inference and Q learning, but not by models based on either principle alone. A Q-learning model fails to capture the latent state inference and/or the rule 2 advantage. The Bayesian inference model captures the rapid switches to rule 2 (which are more probable following errors on rule 1 and rule 3) but predicts matched discrimination performance for partially ambiguous stimuli on colour rules 2 and 3. This is because although knowing the most likely rule increases the probability of a correct response overall it does not increase discriminability and thus boosts the more ambiguous stimuli. I wondered whether it might be possible to explain this result with the addition of an attention-like mechanism that depends on the top-down inference about the rule. For example, greater certainty about the rule might increase the gain of discrimination (psychometric slope) in a more general way.

    The authors propose a hybrid model in which there is an implicit assumption that the response axis defines the rule. The model infers the latent state like an ideal observer but learns the stimulus-response mappings by trial and error. This means that the monkeys are obliged to constantly re-learn the response mappings along the shared response axis (for rules 1/3) but they remain fixed for rule 2 because it has a unique response axis. This model can capture the two major effects, and for free captures the relative performance on congruent and incongruent trials (those trials where the required action is the same, or different, for given stimuli across rules) on different blocks.

    I found the author's account to be plausible but it seemed like there might be other possible explanations for the findings. In particular, having read the paper I remained unclear as to whether it was the sharing of response axis per se that drove the cost on rule 3 relative to 2, or whether it was only because of the assumption that response axis = rule that was built into the authors' hybrid model. It would have been interesting to know, for example, whether a similar advantage for ambiguous stimuli on rule 2 occurred under circumstances where the rule blocks occured randomly and with equal frequency (i.e. where there was response axis sharing but no higher probability); or even whether, if the rule was explicitly signalled from trial to trial, the rule 2 advantage would persist in the absence of any latent state inference at all (this seems plausible; one pointer for theories of resource sharing is this recent review: https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(21)00148-0?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS1364661321001480%3Fshowall%3Dtrue). No doubt these questions are beyond the scope of the current project but nevertheless it felt to me that the authors' model remained a bit tentative for the moment.

    Was this evaluation helpful?