Probing the decision-making mechanisms underlying choice between drug and nondrug rewards in rats

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Summary:

    In this manuscript the authors perform a retrospective analysis in attempt to delineate the role of goal-directed versus habitual mechanisms underlying choice between drug and non-drug rewards. Specifically, the authors utilized data generated in their laboratory to assess cocaine-versus-saccharin choice following limited and extended training paradigms. A sequential choice model was used to assess the prediction that increased latencies during choice reflect goal-directed control; whereas no change in latencies reflects habitual control. Based on this model, the authors report that rats engage in goal-directed control after limited training, and adopt more habitual responding after extended training. The authors conclude that the sequential choice model is specific to habitual choice.

    While the Reviewers appreciate the approach and conceptual framework described in this manuscript, they are all in agreement that additional data and analyses are needed to better support the claims surrounding goal-directed versus habitual control of reward-seeking behavior. For example, an independent evaluation of whether the target behavior is in fact goal-directed or habitual seems necessary to support such claims. Reviewers’ comments and suggestions for improvement are included below.

    Reviewer #1 and Reviewer #2 opted to reveal their name to the authors in the decision letter after review.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Delineating the decision-making mechanisms underlying choice between drug and nondrug rewards remains a challenge. This study adopts an original approach to probe these mechanisms by comparing response latencies during sampling versus choice trials. While lengthening of latencies during choice is predicted in a deliberative choice model (DCM), the race-like response competition mechanism postulated by the Sequential choice model (SCM) predicts a shortening of latencies during choice compared to sampling. Here, we tested these predictions by conducting a retrospective analysis of cocaine-versus-saccharin choice experiments conducted in our laboratory. We found that rats engage deliberative decision-making mechanisms after limited training, but adopt a SCM-like response selection mechanism after more extended training, while their behavior is presumably habitual. Thus, the DCM and SCM may not be general models of choice, as initially formulated, but could be dynamically engaged to control choice behavior across early and extended training.

Article activity feed

  1. Reviewer #3:

    Behaviours that are instrumental for producing reward can be either goal-directed or, after repeated practice, habitual. Tasks that dissociate these types of learning, notably outcome devaluation, are tricky to implement for studying intravenous drug delivery although there is great interest to understand the role of habits in controlling drug use and addiction and so this paper is important in that regard. This article takes a new approach analyzing response latencies to infer the types of decision-making process that underlies a reward-seeking behaviour. Goal-directed behaviours are argued to involve evaluation of the outcome of responding and/or deliberation between choices both of which should take time, and slow responding relative to an efficient but inflexible habit. So I think this approach is quite interesting. The paper is well written and the predictions are clear.

    My main issue in evaluating the current article is that while different predictions are made about when response latency should be relatively fast or slow, since the article is framed in terms of dissociating goal-directed and habitual processes, I feel there should be some independent evaluation of whether the target behaviour is in fact goal-directed or habitual. The authors rely on the amount of training as extended training has been shown to promote habitual control. However, exactly how much training is needed and how other parameters (type of reward, schedules of reinforcement, choice or single outcome) affect when habitual control may emerge varies widely in the literature and I don't think we can take for granted that after a certain amount of training responding will be habitual without testing that.

    It is also important to consider alternative explanations for differences in response latency. A behaviour that is well-practiced might well be expected to become more efficient and faster. This need not be due to habit formation. The authors acknowledge the possibility that responding could be at floor but don't really discuss it or whether it might apply more to the saccharin response.

  2. Reviewer #2:

    When animals are given a choice between drug and nondrug reinforcers, they will most often choose the nondrug alternative even when presented with highly reinforcing drugs of abuse. This is difficult to reconcile with known behavior in humans and for modeling aspects of addiction that are critical to the disorder, such as choosing to use drugs above all other reinforcers. Recent work by this same group has reported that responding for nondrug reinforcer is, surprisingly, insensitive to devaluation. This suggests that the choice for the nondrug reinforcer is under habitual, rather than the presumed goal-directed, control and may explain why animals most often choose the nondrug reinforcer over drug reinforcers. Moreover, because there is no devaluation procedure for determining whether drug choice is habitual or goal directed, it's not known if choice for drug is also habitual or remains goal-directed.

    The manuscript by Vandaele et al., therefore, sought to develop a procedure for determining whether behavior of rats making choices between saccharin and cocaine reinforcers was habitual or goal-directed based on reaction times (RT). Based on previous theories, the authors argue that goal-directed behavior should have slower RTs on choice trials versus sampling trials (e.g., because animals are deliberating between the alternatives) whereas habitual behavior should have similar RTs across both sampling and choice trials. The authors also present a third possibility in which options are evaluated sequentially, rather than simultaneously, resulting in RTs being longer in the sampling versus choice trials. The authors report that rats with minimal training and who are presumed to be goal-directed have slower RTs in choice trials compared to sample trials whereas rats that have had extensive training have similar RTs in the choice and sampling phases. These findings are consistent with their hypotheses. Moreover, they demonstrate that in the small subset of rats that prefer cocaine over saccharin, RTs in the sampling trials are longer than that in the choice trial suggesting that cocaine preferring rats are not evaluating each of the options. These data are the first to evaluate habitual responding for a drug reinforcer and suggest that comparing latencies across different task phases could be used to measure habitual and goal-directed behaviors.

  3. Reviewer #1:

    Vandaele et al. probe the mechanisms of decision making in rats when making a forced choice between drug and non-drug reward. The authors have led the field in this domain. In this manuscript, a retrospective analysis of choice response times from many rats in their past work is used to tease out potential decision-making mechanisms. We know already from decades of work that choice response times are almost always log-normally distributed (humans, non-human primates, rodents). The question here is whether differences in the mean and dispersion of these distributions can be used to derive insights into nature of the decision-making mechanism - a deliberative comparison versus a race model - and how this may differ for rats that prefer cocaine over saccharin and how this might be altered by more extended training. These questions are framed in terms of the differences between goal-directed and habitual behavior which, to be frank, I found less compelling (these response time data are of significant interest in their own right). I enjoyed reading this manuscript. It was thoughtful and well presented. I have only two comments.

    First, much, if not all, of the absolute differences between latencies in sample and choice phases appear to be carried by the sample rather than the choice phase. Choice latencies for cocaine preferring rats, saccharin preferring rats, and the indifferent rats are all very similar. In contrast, the sampling latencies for cocaine preferring rats and the indifferent rats are longer. I am not sure why this should be. My reading was that the authors were more concerned with the choice side of the experiment being different, not the sample phase. Is this predicted by the models being tested? I struggled to understand why an SCM-like model would predict the difference being in the sample phase. Either way, the authors could be clearer about where the difference is expected to lie and why the sample phase is so obviously different in some conditions and the choice phase so similar.

    Second, the main and real issue for me is whether the differences between response latencies in the sample versus choice phases plausibly reflect operation of different decision making mechanisms (race model versus deliberative processing) or different operation of the same decision-making mechanism. I don't know the answer, but I could not really derive the answer from the data and modelling provided. The authors frame the differences in response time as being uniquely predicted or explained by different forms of choice. The models that the authors are using are closely linked to, and intellectually derived from, models of human choice reaction time. The most successful of these models are the diffusion model (DDM) (Ratcliff, R., Smith, P.L., Brown, S.D., and McKoon, G. (2016). Diffusion Decision Model: Current Issues and History. Trends in Cognitive Sciences 20, 260-281) and the linear ballistic accumulator (LBA) (Brown, S.D., and Heathcote, A. (2008). The simplest complete model of choice response time: linear ballistic accumulation. Cognitive Psychology 57, 153-178.2008).

    Even though the DDM and LBA adopt different architectures to each other (but the same architectures as those in Supp Fig 1A), they are intended to explain the same data. Of relevance, the same model (a DDM or an LBA) can explain differences in both the response distribution and the mean response time via changes in the starting point of evidence accumulation, rate of evidence accumulation, and/or the boundary or threshold at which evidence is translated into choice behavior. So, for either a difference accumulator model (DDM) or a race model (LBA), the difference between sampling and choice performance could reflect changes in how the model is operating between these two phases, including a change in the starting point of the decision [bias], a change in rate of accumulation [evidence], a change in threshold [caution] or collapsing boundary scenario, rather than reflecting operation of a completely different decision-making mechanism.

    In thinking of a way forward I readily concede I could be wrong and the authors may effectively rebut this point. Another option could be to acknowledge this possibility and discuss it. E.g., does it really matter if it is a qualitatively different decision-making process or different operation of the same decision-making mechanism? I don't really think the action-habit distinction lives or dies by reaction/response time data, this distinction is almost certainly far less absolute than often portrayed in the addiction literature, and it is generally intended as an account of what is learned rather than an account of how that learning is translated into behaviour (even if an S-R mechanism provides an account of both). Response time data tell me, at least, something different about how what has been learned is translated into behaviour. The third, marginally more difficult but more interesting option, would be to explore these issues formally and to move beyond simple descriptive or LDA analyses of response time distributions. The LBA has a full analytical solution and there are reasonable approximations for the DDM. Formal modelling of choice response times (e.g., Bayesian parameter estimation for a race model or DDM) could indicate whether a single decision-making mechanism (LBA or DDM or something else) can explain response times under both sample and choice conditions or not. This is a standard approach in cognitive modelling. This would be compelling if it showed the dissociation the authors argue - i.e. one model cannot be fit to both sample and choice datasets for all animals. However, if one model can be fit to both, then formal modelling would show which decision making parameters change between the sample and choice conditions for cocaine v saccharin v individual animals to putatively cause the differences in response times observed. Either way, more formal modelling would provide a platform towards identification of those specific features of the decision-making mechanisms that are being affected.

  4. Summary:

    In this manuscript the authors perform a retrospective analysis in attempt to delineate the role of goal-directed versus habitual mechanisms underlying choice between drug and non-drug rewards. Specifically, the authors utilized data generated in their laboratory to assess cocaine-versus-saccharin choice following limited and extended training paradigms. A sequential choice model was used to assess the prediction that increased latencies during choice reflect goal-directed control; whereas no change in latencies reflects habitual control. Based on this model, the authors report that rats engage in goal-directed control after limited training, and adopt more habitual responding after extended training. The authors conclude that the sequential choice model is specific to habitual choice.

    While the Reviewers appreciate the approach and conceptual framework described in this manuscript, they are all in agreement that additional data and analyses are needed to better support the claims surrounding goal-directed versus habitual control of reward-seeking behavior. For example, an independent evaluation of whether the target behavior is in fact goal-directed or habitual seems necessary to support such claims. Reviewers’ comments and suggestions for improvement are included below.

    Reviewer #1 and Reviewer #2 opted to reveal their name to the authors in the decision letter after review.