Effects of dopamine D2/3 and opioid receptor antagonism on the trade-off between model-based and model-free behaviour in healthy volunteers

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This study provides novel evidence that a dopamine D2 receptor antagonist enhances model-based control of behavior, whereas blocking opioid receptors has no effect on the trade-off between habitual responding and goal-directed planning. These conclusions are based on compelling behavioral and computational modeling data and will be of interest to cognitive neuroscientists and computational psychiatrists.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Human behaviour requires flexible arbitration between actions we do out of habit and actions that are directed towards a specific goal. Drugs that target opioid and dopamine receptors are notorious for inducing maladaptive habitual drug consumption; yet, how the opioidergic and dopaminergic neurotransmitter systems contribute to the arbitration between habitual and goal-directed behaviour is poorly understood. By combining pharmacological challenges with a well-established decision-making task and a novel computational model, we show that the administration of the dopamine D2/3 receptor antagonist amisulpride led to an increase in goal-directed or ‘model-based’ relative to habitual or ‘model-free’ behaviour, whereas the non-selective opioid receptor antagonist naltrexone had no appreciable effect. The effect of amisulpride on model-based/model-free behaviour did not scale with drug serum levels in the blood. Furthermore, participants with higher amisulpride serum levels showed higher explorative behaviour. These findings highlight the distinct functional contributions of dopamine and opioid receptors to goal-directed and habitual behaviour and support the notion that even small doses of amisulpride promote flexible application of cognitive control.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    This study examines whether the D2 receptor antagonist amisulpride and the mu-opioid receptor antagonist naltrexone bias model-based vs model-free behavior in a well-established two-step task of behavioral control. The authors find that amisulpride enhances model-based choices, which is further supported by computational modeling of the data, revealing an increase in the relative contribution of model-based control of behavior. Naltrexon on the other hand had no reliable effect on model-based behavior.

    Overall, this is a very nice study with many strengths, including the task and data analysis. A particular strength of the design is the combination of a between-subject drug administration protocol with two within-subject (baseline vs. drug) sessions. This reduces between-subject variability in baseline model-based vs model-free behavior and enhances the power to detect drug effects.

    The introduction could do a better job articulating the rationale for testing the effect of these two specific drugs. Currently, the rationale is that both transmitter systems targeted by these drugs are involved in drug addiction, which is characterized by an imbalance in model-based vs. habitual control of behavior. This appears somewhat indirect.

    Blood draws were used to determine serum levels for amisulpride and naltrexone but these data are not included as covariates in the analysis.

    We thank the reviewer for the high acclaim of our study, and for the constructive comments to improve it. We acknowledge that the introduction did not motivate the main research goal of the manuscript clearly enough. We have now extended this section and provided further insight into our reasoning behind the study design. Beyond the involvement of opioid and dopamine promoting drugs in addiction, there is abundant evidence from experimental studies showing comparable effects of manipulating receptors of both systems in model-free processes such as reinforcement, and habit formation. Based on this overlap one may predict that both neurotransmitter systems disrupt habit formation in a similar fashion, and that blocking their respective receptors will improve the ability to behave in a model-based manner. However, as we now elaborate in the manuscript, an argument against this could be that disrupting model-free processes might not be enough to promote model-based behaviour, as such behaviour relies heavily on cognitive control. It is therefore especially interesting to compare opioid antagonists, that do not enhance cognitive function, with a D2 antagonist at a dosage that has been shown to increase cognitive control as well as increase the desire to exert cognitive effort.

    This is expressed in the following paragraphs of the Introduction (p.2 §3 and p.3 §1):

    “Opiates, psychostimulants, and most other drugs of abuse increase the release of dopamine along the mesolimbic pathway (Chiara, 1999; Koob & Bloom, 1988), a circuit that plays a central role in reinforcement learning (Schultz, Dayan, & Montague, 1997). On top of this, the reinforcing properties of addictive drugs also depend on their ability to activate the μ opioid receptors (Becker, Grecksch, & Kraus, 2002; Benjamin, Grant, & Pohorecky, 1993; Le Merrer, Becker, Befort, & Kieffer, 2009). This suggests that both the dopamine and the opioid systems might be particularly relevant in model-free reinforcement learning processes that drive the formation of habitual behaviour. Studies in rodents show that activating receptors of both systems across the striatum increases cue-triggered wanting of rewards (Peciña & Berridge, 2013; Soares-Cunha et al., 2016). Conversely, inhibition of both D1-type and D2-type of dopamine receptors (referred to as D1 and D2 from here on) as well as opioid receptors reduces motivation to obtain or consume rewards (Laurent, Leung, Maidment, & Balleine, 2012; Peciña, 2008; Soares-Cunha et al., 2016). This data raises the hypothesis that the drift towards habitual control is enabled by dopamine and opioid receptors via a common neural pathway. Recent work in humans provides some evidence in this direction, whereby systemic administration of opioid and D2 dopamine receptor antagonists causes a comparable reduction of cue responsivity and reward impulsivity (Weber et al., 2016) and decreases the effort to obtain immediate primary rewards (Korb et al., 2020). This suggests that when allocating control between the model-based and model-free system, dopamine or opioid receptor antagonists might comparatively disrupt model-free behavioural strategies and increase model-based behaviour. Yet, no study in humans has directly investigated this. Furthermore, disrupting habit formation might not in itself lead to increased model-based control, without either increasing the perceived value of applying cognitive control or making it easier to do so.”

    We also mention the implications of this direct comparison of the two compounds in the Discussion (p.8 §1):

    “Our findings provide initial evidence for a divergent involvement of the dopamine and opioid neurotransmitter systems in the shift between habitual and goal-directed behaviour. The lack of effects of naltrexone on the model-based/model-free trade-off also provides some support for the notion that simply disrupting neurobiological systems that subserve habitual behaviour might not be enough to increase goal-directed behaviour in this task. An increase in the model-based/model-free weight following amisulpride administration advocates for dopamine playing a decisive role in flexibly applying cognitive control to facilitate model-based behavior and highlights the specific functional contribution of the D2 receptor subtype.”

    Reviewer #3 (Public Review):

    I think this is an interesting study on an important topic. I agree that there is not enough research to understand how the dopaminergic system interfaces with goal-directed planning, and I like the focus on specific types of dopamine receptors. It is interesting that they seem to find a specific effect on just the dopamine antagonist. I also appreciate the clarity with which the authors describe this field of research and their results. However, I also feel that there are several concerns with this paper, both in terms of framing and in terms of the experimental design and analysis. For completeness, I must note that I am not a dopamine expert.

    I felt that the introduction of the paper did not sufficiently motivate the focus on the comparison between neurotransmitters systems, and (for the dopaminergic system) the distinction between D1/D2 receptors. Why is the mapping between stability/flexibility and D1/D2 receptors important? How does this relate to model-based control? Why do the authors predict that model-based control would increase when D2 receptors are blocked? If the hypothesis is about contrasting the contribution of D1 and D2 receptors to goal-directed control, why did the authors not use antagonists directly targeting these two systems?

    In addition, the predictions that are more explicit, for example, that blocking D2 receptors increases MB control by stabilizing goal-relevant information, are fairly specific. However, the current version of the two-step task is not amenable to testing such a specific hypothesis, because it doesn't allow us to measure the specific components of planning (e.g., maintaining goals, the representation of the structure, prospective reasoning). Moreover, MB control in this version of the two-step task is marked by flexibility, because it requires the agent to be sensitive to switching starting states.

    The predictions for the opioid system are also lacking. Why are the authors targeting this system? Why are they comparing the effects of the D2 antagonist with the opioid agonist? Why do the authors predict that amisulpride should have a stronger effect than naltrexone? In my opinion, these predictions were not sufficiently laid out, which made it difficult to appreciate the authors' motivation to run the study.

    We thank the reviewer for their critical take on the manuscript and for clearly pointing out the weaknesses in argumentation. In particular, we appreciate the reviewer’s comment on the lack of clarity in describing why the comparison of dopamine and opioid antagonists’ effects on MB/MF behaviour might be particularly interesting and why we focused on D2 and not D1 receptors. We now extended the introduction section to clarify our rationale for comparing these two compounds (p.2-3). In short, apart from the fact that both systems are implicated in addiction, there is also abundant experimental evidence from human and non-human animal studies that the two systems are involved in processes related to forming habitual responses to primary and secondary rewards. This suggests that blocking receptors of either system might comparatively affect the MB/MF trade-off by impairing model-free processes. We therefore proceeded to compare opioid and dopamine antagonists.

    As we note, using D1 antagonists would likely be detrimental to cognitive control related processes, and therefore more likely to decrease model-based performance. We therefore chose to compare opioid antagonists to D2 receptor antagonists. Another important reason for comparing the effects of opioid and D2 dopamine antagonists is the reasoning that it is not clear whether blocking model-free processes is in itself enough to promote model-based behaviour, without boosting cognitive control related processes. Given the recent evidence for D2 antagonists increasing cognitive effort (Westbrook et al., 2020) and the proposed role of prefrontal D2 receptors in destabilising prefrontal representations (according to the dual state theory of prefrontal dopamine function proposed by Durstewitz & Seamans, 2008)) we reasoned that D2 receptor blockade might also boost the ability (or willingness) to keep the mapping between spaceships and planets online while making choices.

    We incorporated these arguments in the revised Introduction (p.2-3):

    “Opiates, psychostimulants, and most other drugs of abuse increase the release of dopamine along the mesolimbic pathway (Chiara, 1999; Koob & Bloom, 1988), a circuit that plays a central role in reinforcement learning (Schultz et al., 1997). On top of this, the reinforcing properties of addictive drugs also depend on their ability to activate the μ opioid receptors (Becker et al., 2002; Benjamin et al., 1993; Le Merrer et al., 2009). This suggests that both the dopamine and the opioid systems might be particularly relevant in model-free reinforcement learning processes that drive the formation of habitual behaviour. Studies in rodents show that activating receptors of both systems across the striatum increases cue-triggered wanting of rewards (Peciña & Berridge, 2013; Soares-Cunha et al., 2016). Conversely, inhibition of both D1-type and D2-type of dopamine receptors (referred to as D1 and D2 from here on) as well as opioid receptors reduces motivation to obtain or consume rewards (Laurent et al., 2012; Peciña, 2008; Soares-Cunha et al., 2016). This data raises the hypothesis that the drift towards habitual control is enabled by dopamine and opioid receptors via a common neural pathway. Recent work in humans provides some evidence in this direction, whereby systemic administration of opioid and D2 dopamine receptor antagonists causes a comparable reduction of cue responsivity and reward impulsivity (Weber et al., 2016) and decreases the effort to obtain immediate primary rewards (Korb et al., 2020). This suggests that when allocating control between the model-based and model-free system, dopamine or opioid receptor antagonists might comparatively disrupt model-free behavioural strategies and increase model-based behaviour. Yet, no study in humans has directly investigated this. Furthermore, disrupting habit formation might not in itself lead to increased model-based control, without either increasing the perceived value of applying cognitive control or making it easier to do so. Crucially, there are important differences in how each of the two neurochemical systems relate to cognitive control that is pivotal for model-based behaviour. Across a wide range of studies using various dosing schemes, opioid receptor antagonists did not have an effect on tasks that require cognitive control, such as working memory (Del Campo, McMurray, Besser, & Grossman, 1992; File & Silverstone, 1981; Volavka, Dornbush, Mallya, & Cho, 1979), sustained attention(Zacny, Coalson, Lichtor, Yajnik, & Thapar, 1994), or mathematical problem-solving (Del Campo et al., 1992) (see (van Steenbergen, Eikemo, & Leknes, 2019) for a review). Dopaminergic circuits, on the other hand, play a central role in higher cognitive functions and goal-directed behaviour (Brozoski, Brown, Rosvold, & Goldman, 1979). In particular, D1 dopamine receptors in the prefrontal cortex enable maintenance of goal-relevant information and working memory(Goldman-Rakic, 1997; Sawaguchi & Goldman-Rakic, 1991; van Schouwenburg, Aarts, & Cools, 2010; Williams & Goldman-Rakic, 1995), while the D2 dopamine receptor activity disrupts prefrontal representations(Durstewitz & Seamans, 2008). In support of this, decreased working memory performance was observed after blocking prefrontal D1, but not prefrontal D2 receptors (Arnsten, 2011; Sawaguchi & Goldman-Rakic, 1991; Seamans & Yang, 2004). In humans, systemic administration of D2 antagonism increased the ability to maintain and manipulate working memory representations (Dodds et al., 2009; Frank & O’Reilly, 2006) and increased the value of applying cognitive effort (Westbrook et al., 2020). This data suggests that blocking D2 receptors, in contrast to blocking opioid receptors, could further facilitate model-based behaviour through enabling or encouraging flexible use of cognitive control.”

    Another important point that the reviewer stresses is that the two-step task we use does not allow us to draw any conclusions through which mechanisms amisulpride increases model-based behaviour. Although we base our hypothesis that D2 might promote model-based behaviour (on top of disrupting habit formation) on previous work showing D2 blockade increasing cognitive effort and the ability to manipulate working memory representations, we completely agree that our setup does not give any definite answers about which of these cognitive processes mediated the increase in model-based weights. In the discussion we try to interpret our findings in the context of the dual-state hypothesis framework and within the framework of striatal control of adaptive behaviour (p.8 §3-4), whereby we centre our argumentation around dopaminergic circuits that subserve one or the other mechanism.

    We agree with the reviewer that the task requires a high degree of flexible planning and that the dual-state theory might not be enough to account for our effects. We mention this in the Discussion (p. 8 §3):

    “The effects of D2 antagonism on model-based/model-free behaviour in our study can be interpreted within this [dual-state] framework to result from increased ability to maintain prefrontal representation of the mapping between the spaceships and the planets online. However, this is difficult to reconcile with the fact that model-based behaviour in dynamic learning paradigms, such as the one used here, also requires flexible updating of action values.”

    We also elaborate on the general limitations of drawing inference about the underlying cognitive/computational mechanisms in the Discussion (p. 14 §2):

    “Importantly, it should also be acknowledged that the behavioural setup in our study does not allow us to draw definite conclusions about the mechanisms that mediate amisulpride’s effects on model-based or model-free behaviour. For example, it is not clear whether amisulpride increases the perceived benefit of applying cognitive control, or whether it increases the participant’s ability to do so through various possible complementary processes, such as goal maintenance or planning abilities. Future studies should further elucidate the mechanistic contributions of dopamine receptors to the distinct coding and utilisation of task relevant representations (Langdon, Sharpe, Schoenbaum, & Niv, 2018; Stalnaker et al., 2019).”

    Related to this, I felt that the introduction was a bit too quiet on the genetic markers. Their discussion in the results was a bit surprising, and it wasn't quite clear why the authors decided to investigate these interaction effects.

    We appreciate this comment as we were quite uncertain ourselves on how much weight to give to those data. Previous research had indeed shown profound variability in MB/MF behaviour across genotypes related to baseline dopamine function. The main purpose of the genetic analysis was to control for potential baseline differences and to explore the drug genotype interactions. However, including the serum data as a covariate in analyses, as suggested by the other reviewers, made most results relating to the genetic analysis disappear, even when using less conservative priors that likely understate the variance of posterior distributions of group effects. We have therefore opted to keep coverage of the genetic data to a minimum, but still report the results and make the data available online for future studies.

    I found some of the core results confusing. Most importantly, why does amisulpride make people less like to stay after a reward when the first-stage state is the same? When first-stage states repeat, both an MB agent and an MF agent will be more likely to stay after a reward. To me, this kind of behavior doesn't seem particularly model-based. Why does this behavior occur under amisulpride? I was surprised that the authors did not really address it.

    We agree that these results have been somewhat difficult to reconcile. However, adding amisulpride serum levels to our analyses now allow us to get a better understanding. It seems that across both serum groups model-based behaviour was increased, however, only in the high serum group did we additionally observe increased exploration. We also note that increased exploration was related to a reduced effect of previous points in the first same state trials, whereas the interaction term (effect of previous points in diff vs. same state trials) was more strongly associated with the model-based weight. In the manuscript this is described in the results section and in the discussion.

    The following text is included in the Results (p.6):

    “We first observed that the more model-based choices the participants made, the more money they earned (r = 0.65, 95% CI [0.53, 0.76]). This serves as a validity check of the task, which was designed to make cognitive control pay off (literally)45. We then looked at how the model parameters relate to the random slopes from the behavioural analysis of staying behaviour and found that the participant-level (random effect) slope for the effect of previous points on staying behaviour in different vs. same first state trials was most strongly related to ω (d = 0.493, P < 10e-3) and negatively related to the inverse temperature parameter η (d = -0.328, P < 10e-3), and the slope for trials with same first states was mostly related to η (d = 0.822, P < 10e-3), and less so to ω (d = 0.235, P < 10e-3).”

    The following text is included in the Discussion (p.8 §2):

    “Interestingly, amisulpride also increased choice stochasticity parametrised by the softmax inverse temperature parameter. In a paradigm with two choice options, it cannot be definitively determined whether this indicates higher decision-noise or increased exploration of alternative choices. We can however speculate that increased decision noise would lead to overall detrimental effects on learning in both trial types with same and different consecutive first stage states, which we do not observe in our data. The effect on the choice stochasticity parameter was only present in participants with a higher effective dose75, suggesting that the effect was more likely to be post-synaptic. Similarly, in the same effective dose group, we found some evidence that amisulpride reduces response stickiness indicating increased switching between actions. This is well in line with a prominent model of the cortico-striatal circuitry implicating post-synaptic D2 receptors in exploration/exploitation65 and supported by empirical data. In animal studies, activation of D2 receptors was shown to lead to choice perseverance and more deterministic behaviour, whereas D2 receptor inhibition increases the probability of performing competing actions and increases randomness in action selection76. In humans, a recent neurochemical imaging study showed that D2 receptor availability in the striatum correlated with choice uncertainty parameters across both reinforcement learning and active inference computational modelling frameworks77. Increased choice uncertainty was also observed in a social and non-social learning tasks in a study using 800 mg of sulpiride, a dose that is known to exert post-synaptic effects54,78. We note, however, that the evidence for the difference in exploration between the low and high serum groups was not robust (p=0.066). Furthermore, it has been suggested that increased striatal dopamine is also related to tendency for stochastic, undirected exploration79,80, arising due to overall uncertainty across available options79 or through increasing the opportunity cost of choosing the wrong option68,71. This suggests that the same biological signature that leads to increased cognitive effort expenditure also promotes choice exploration. In line with this, both prior studies that investigated the effect of increasing dopamine availability with L-DOPA on model-based/model-free behaviour observed increase choice exploration as well as increased model-based behaviour (although in one it was only present in individuals with a higher working memory capacity)55,58.”

    With regards to the design, it is unfortunate that the order of drug administration is not counterbalanced. As far as I understand, model-based control is always measured without a drug in the first session, and then with the drug (or placebo) in the second. The change between sessions is then tested for all three conditions. Of course, it is possible that the increase in model-based control in the amisulpride condition is only driven by the drug. However, given the lack of counterbalancing, it's also possible that amisulpride increases model-based control only after the experience with the task. That is, if the authors had counterbalanced the drug effect, they may have found that amisulpride had a different effect if it was administered in the first session. That would have changed their interpretation quite a bit! As it stands, they are unable to verify their (admittedly simpler) hypothesis that there is only a main effect.

    We thank the reviewer for this comment. Indeed, a full within-subject design would have been statistically more powerful and would have enabled us to exclude the possibility that amisulpride’s effect on model-based behaviour is indirect. We have now included the following paragraph in the discussion that aims to highlight the limitation of not counterbalancing the drug administration (p.10):

    “One of the strengths of our design is a baseline measure, and the fact that the participants were all introduced to the task under no administration, thus avoiding potential effects of the treatment on task training. Although this design allowed to reduce between-subjects variability, we cannot completely exclude order effects. Although unlikely, it is possible that the effects of the treatment that we observe come indirectly from the effects of the two drugs on either skill transfer from the previous session, or simply on the effect of the drugs on the part of the experiment that preceded the task. For instance, participants under amisulpride could be less tired from other tasks and therefore more willing to exert effort in the task presented here. Speaking against this is the observation that we found no differences in mood between amisulpride and placebo regardless of low or high serum levels.”

  2. Evaluation Summary:

    This study provides novel evidence that a dopamine D2 receptor antagonist enhances model-based control of behavior, whereas blocking opioid receptors has no effect on the trade-off between habitual responding and goal-directed planning. These conclusions are based on compelling behavioral and computational modeling data and will be of interest to cognitive neuroscientists and computational psychiatrists.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

  3. Reviewer #1 (Public Review):

    This study examines whether the D2 receptor antagonist amisulpride and the mu-opioid receptor antagonist naltrexone bias model-based vs model-free behavior in a well-established two-step task of behavioral control. The authors find that amisulpride enhances model-based choices, which is further supported by computational modeling of the data, revealing an increase in the relative contribution of model-based control of behavior. Naltrexon on the other hand had no reliable effect on model-based behavior.

    Overall, this is a very nice study with many strengths, including the task and data analysis. A particular strength of the design is the combination of a between-subject drug administration protocol with two within-subject (baseline vs. drug) sessions. This reduces between-subject variability in baseline model-based vs model-free behavior and enhances the power to detect drug effects.

    The introduction could do a better job articulating the rationale for testing the effect of these two specific drugs. Currently, the rationale is that both transmitter systems targeted by these drugs are involved in drug addiction, which is characterized by an imbalance in model-based vs. habitual control of behavior. This appears somewhat indirect.

    Blood draws were used to determine serum levels for amisulpride and naltrexone but these data are not included as covariates in the analysis.

  4. Reviewer #2 (Public Review):

    Insights into the neurochemical mechanisms underlying the trade-off between model-based and model-free behavior may lead to a deeper understanding of the decision-making deficits in several clinical disorders. The current findings show that pharmacologically blocking dopaminergic neurotransmission strengthens model-based over model-free behavior, whereas reduced opioidergic activity showed no significant effects. The current investigation has several strengths: It uses an established task and a sophisticated computational model to quantify the balance between the model-based and the model-free system. The main weakness is that the interpretation of the data is hampered by the fact that the administered dose of amisulpride can lead to both presynaptic and postsynaptic effects.

  5. Reviewer #3 (Public Review):

    I think this is an interesting study on an important topic. I agree that there is not enough research to understand how the dopaminergic system interfaces with goal-directed planning, and I like the focus on specific types of dopamine receptors. It is interesting that they seem to find a specific effect on just the dopamine antagonist. I also appreciate the clarity with which the authors describe this field of research and their results. However, I also feel that there are several concerns with this paper, both in terms of framing and in terms of the experimental design and analysis. For completeness, I must note that I am not a dopamine expert.

    I felt that the introduction of the paper did not sufficiently motivate the focus on the comparison between neurotransmitters systems, and (for the dopaminergic system) the distinction between D1/D2 receptors. Why is the mapping between stability/flexibility and D1/D2 receptors important? How does this relate to model-based control? Why do the authors predict that model-based control would increase when D2 receptors are blocked? If the hypothesis is about contrasting the contribution of D1 and D2 receptors to goal-directed control, why did the authors not use antagonists directly targeting these two systems?

    In addition, the predictions that are more explicit, for example, that blocking D2 receptors increases MB control by stabilizing goal-relevant information, are fairly specific. However, the current version of the two-step task is not amenable to testing such a specific hypothesis, because it doesn't allow us to measure the specific components of planning (e.g., maintaining goals, the representation of the structure, prospective reasoning). Moreover, MB control in this version of the two-step task is marked by flexibility, because it requires the agent to be sensitive to switching starting states.

    The predictions for the opioid system are also lacking. Why are the authors targeting this system? Why are they comparing the effects of the D2 antagonist with the opioid agonist? Why do the authors predict that amisulpride should have a stronger effect than naltrexone? In my opinion, these predictions were not sufficiently laid out, which made it difficult to appreciate the authors' motivation to run the study.

    Related to this, I felt that the introduction was a bit too quiet on the genetic markers. Their discussion in the results was a bit surprising, and it wasn't quite clear why the authors decided to investigate these interaction effects.

    I found some of the core results confusing. Most importantly, why does amisulpride make people less like to stay after a reward when the first-stage state is the same? When first-stage states repeat, both an MB agent and an MF agent will be more likely to stay after a reward. To me, this kind of behavior doesn't seem particularly model-based. Why does this behavior occur under amisulpride? I was surprised that the authors did not really address it.

    With regards to the design, it is unfortunate that the order of drug administration is not counterbalanced. As far as I understand, model-based control is always measured without a drug in the first session, and then with the drug (or placebo) in the second. The change between sessions is then tested for all three conditions. Of course, it is possible that the increase in model-based control in the amisulpride condition is only driven by the drug. However, given the lack of counterbalancing, it's also possible that amisulpride increases model-based control only after the experience with the task. That is, if the authors had counterbalanced the drug effect, they may have found that amisulpride had a different effect if it was administered in the first session. That would have changed their interpretation quite a bit! As it stands, they are unable to verify their (admittedly simpler) hypothesis that there is only a main effect.