Pallidal neuromodulation of the explore/exploit trade-off in decision-making

Ana Luisa de A Marcelino
Owen Gray
Bassam Al-Fatly
William Gilmour
J Douglas Steele
Andrea A Kühn
Tom Gilbertson

Curated by eLife

Evaluation Summary:

This paper presents an exploitation/exploration paradigm using a model-based approach in 18 patients treated with GPi DBS for Tourette's syndrome. Their main observation is that despite DBS (used as a proxy of GPi inhibition) doesn't have any effect on the overall performance of the subjects, it has a significant effect on the probability of exploration. This work will be interesting for scientists working in fundamental and clinical neurosciences.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (eLife)

Abstract

Every decision that we make involves a conflict between exploiting our current knowledge of an action’s value or exploring alternative courses of action that might lead to a better, or worse outcome. The sub-cortical nuclei that make up the basal ganglia have been proposed as a neural circuit that may contribute to resolving this explore-exploit ‘dilemma’. To test this hypothesis, we examined the effects of neuromodulating the basal ganglia’s output nucleus, the globus pallidus interna, in patients who had undergone deep brain stimulation (DBS) for isolated dystonia. Neuromodulation enhanced the number of exploratory choices to the lower value option in a two-armed bandit probabilistic reversal-learning task. Enhanced exploration was explained by a reduction in the rate of evidence accumulation (drift rate) in a reinforcement learning drift diffusion model. We estimated the functional connectivity profile between the stimulating DBS electrode and the rest of the brain using a normative functional connectome derived from heathy controls. Variation in the extent of neuromodulation induced exploration between patients was associated with functional connectivity from the stimulation electrode site to a distributed brain functional network. We conclude that the basal ganglia’s output nucleus, the globus pallidus interna, can adaptively modify decision choice when faced with the dilemma to explore or exploit.

Version published to 10.7554/elife.79642 on eLife
Feb 2, 2023
eLife
Aug 11, 2022

Evaluation Summary:

This paper presents an exploitation/exploration paradigm using a model-based approach in 18 patients treated with GPi DBS for Tourette's syndrome. Their main observation is that despite DBS (used as a proxy of GPi inhibition) doesn't have any effect on the overall performance of the subjects, it has a significant effect on the probability of exploration. This work will be interesting for scientists working in fundamental and clinical neurosciences.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

Read the original source
eLife
Aug 11, 2022
Reviewer #1 (Public Review):

The manuscript is well written, clearly describes the scientific background and hypotheses, and provides a sound illustration of the results, which can advance our current understanding of the neural basis of decision-making processes. The main conclusion is that pallidal stimulation in patients with dystonia leads to an increased number of exploratory choices, i.e. choosing the option with a lower expected value instead of exploiting the option with the highest expected value. There are, however, some shortcomings that limit the interpretability of the data in its current form regarding the lack of a healthy control group, inconsistency between frequentist and Bayesian statistics applied, and the limited specificity of the connectome correlation analysis. These shortcomings should be addressed by the …
Reviewer #1 (Public Review):

The manuscript is well written, clearly describes the scientific background and hypotheses, and provides a sound illustration of the results, which can advance our current understanding of the neural basis of decision-making processes. The main conclusion is that pallidal stimulation in patients with dystonia leads to an increased number of exploratory choices, i.e. choosing the option with a lower expected value instead of exploiting the option with the highest expected value. There are, however, some shortcomings that limit the interpretability of the data in its current form regarding the lack of a healthy control group, inconsistency between frequentist and Bayesian statistics applied, and the limited specificity of the connectome correlation analysis. These shortcomings should be addressed by the authors in order to improve the paper.

Detailed description of comments:

(1) Generalizability:
Studying dystonia patients gives the unique opportunity to study the effects of electrical pallidal stimulation on decision-making in humans and given that dystonia primarily affects movements rather than cognition/decision-making this might also well be representative of healthy people. This (i.e. the similarity between task performance of patients and healthy people) is, however, not demonstrated in this study. In the introduction, the authors state that reward prediction error is intact in dystonic patients, but the paper that they cite for this (ref 34) is titled '... abnormal reward learning in cervical dystonia'. Furthermore, albeit clearly less pronounced than movement symptoms cognitive problems are present in dystonia patients (see Jahanshahi 2017 Movement Disorders). I would therefore recommend enrolling a healthy control group allowing to compare DBS ON and DBS OFF to healthy people.

(2) Statistics:
I understand that Bayesian statistics cannot always directly be compared to non-Bayesian frequentist statistics. However, to me, the frequentist and Bayesian statistics are not consistent in this study. ANOVAs, etc are applied on subject-averages data using a p-value of 0.05 to distinguish between significant vs. non-significant results. In the Bayesian modelling analysis, the 95% HDI is computed. While this number is arbitrary (just as a p-value of 0.05) it still has a rationale to it given that in the scientific community 95% is also used for frequentist confidence intervals. Therefore, I think that 95% would be the most consistent choice here. However, none of the model parameters differ between ON vs. OFF regarding the 95% HDIs, since they overlap with 0 (see 'Contrast' in table 1). Especially the decision threshold and drift rate scaling parameter HDIs have a large overlap with 0, but they are still interpreted as significant based on the Bayes factor. The Bayes factor, however, is not used for the behavioral analyses. For example, there are no effects of DBS on decision times, but at the computational level, several parameters (which predict the decision time) are affected. I think for the sake of consistency of analyses within the paper the statistics of the Bayesian analyses should rely on the 95% HDI.

(3) Connectome correlation analysis:
If I understand it correctly, the connectome analysis relates behavioral effects of stimulation to whole-brain networks rather than just local effects in the pallidum by testing whether patients who showed stronger effects of stimulation have electrodes that are closer to connections with different brain areas. In the abstract, the results of this analysis are reported as "... was predicted by the degree of functional connectivity between the stimulating electrode and prefrontal and sensorimotor cortices". In the discussion, it is stated that "...DBS-induced enhanced exploration correlated with the functional connectivity of the stimulation volume in the GPI to frontal cortical regions identified previously in functional imaging studies of explore-exploit decision making ... The exploration-enhancing effects of GPI-DBS in our study were predicted by functional connectivity to brain regions whose neurons encode uncertainty [27] and predict behavioural switching[430 29, 30]". However, figure 4 essentially shows that almost the whole brain correlates with inter-individual differences in behavior reaching correlation coefficients as strong as -0.7 e.g. lower brain stem, cerebellum, and occipital cortex, none of which are mentioned in the paper. To me, it seems that there are correlations with very large and very distributed cortical areas rather than with specific areas in the prefrontal and sensorimotor cortex as stated in the paper.
Related to this point: The variable used for the connectomic correlation analysis is not the same variable that was affected by DBS in the statistical analysis. The statistical analysis found that P(explore) differed between DBS ON vs OFF irrespective of the session. Instead the "maximum within-session increase in P(Explore) DBS-ON - P(Explore ) DBS-OFF" was used.

In general, could you please explain this analysis in more detail? If I understand it correctly each voxel had a value for 'connectivity' to the stimulation field and a value for 'behavioral effect' and across patients, this then gave an R-map. How was figure 4 thresholded (only the maximum positive and negative Rs are given in the color bar)? Then p-values are listed. One is 0.04 and another one is 0.009. What is the difference between the two? These values seem to reflect the correlation of similarity between the individual map with the group map and the behavioral variable, but was the correlation with the behavioral variable not already used for creating the R-map? Describing the analysis in more detail might help make it more understandable to the audience not familiar with the analysis (including me).

It is my understanding that high exploration (e.g. P(Explore) of 0.2) should be related to poorer task performance since the optimal strategy would always use the high-value option and only switch rarely to identify the reversal(s). Why is it then that DBS can affect exploration but not the sum of rewards if the two are related? Should DBS not affect the sum of rewards if it for example was more pronounced in its effect on P(explore)?

Would the authors have predicted different effects for subthalamic deep brain stimulation? The DBS effects on the GPi are mainly interpreted in terms of reduced firing rate/activity. Since the STN exerts glutamatergic innervation of the GPi, should STN suppression lead to similar results? Conversely, GPe exerts GABAergic innervation of the STN. Should GPe suppression lead to the opposite behavioral effect? Were some of the electrodes localized within or close to the GPe rather than GPi and if so, did these patients show different behavioral effects?

Was the OFF vs ON DBS order counterbalanced? 3 patients did not complete the task OFF, and the ON dataset was not available in another patient. Did the authors check if the DBS order was relevant for the DBS effect on P(explore)?
Read the original source
eLife
Aug 11, 2022

Reviewer #2 (Public Review):

This work relies on the exploitation/exploration paradigm that has been proposed to describe a wide range of behaviour observed in ethology and experimental psychology. In the introduction, the authors thoroughly introduce this concept and then focus on a model that has been proposed stipulating that basal ganglia play the role of a decision filter of which the bandwidth tune the exploration/exploitation balance. The paper then proposes an experimental approach in 18 patients treated with GPi DBS for Tourette's syndrome associated with a reinforcement learning drift decision model. Their main observation is that despite DBS (used as a proxy of GPi inhibition) doesn't have any effect on the overall performance of the subject, it has a significant (albeit quite light) effect on the probability of exploration. …

Reviewer #2 (Public Review):

This work relies on the exploitation/exploration paradigm that has been proposed to describe a wide range of behaviour observed in ethology and experimental psychology. In the introduction, the authors thoroughly introduce this concept and then focus on a model that has been proposed stipulating that basal ganglia play the role of a decision filter of which the bandwidth tune the exploration/exploitation balance. The paper then proposes an experimental approach in 18 patients treated with GPi DBS for Tourette's syndrome associated with a reinforcement learning drift decision model. Their main observation is that despite DBS (used as a proxy of GPi inhibition) doesn't have any effect on the overall performance of the subject, it has a significant (albeit quite light) effect on the probability of exploration. They, therefore, conclude that the data support their working hypothesis of the BG playing a role in the exploration strategy.

I have several major concerns with this paper which ultimately failed to convince me:

i) The fact that a decrease in exploration behaviour isn't correlated with a modification of reward pay-off is at odds with the original theory of exploration/exploitation balance. This should at least have been discussed in order to convince the reader of the robustness of the effect observed on the P(Explore).
ii) Alternative hypotheses concerning the role of the BG on the exploration/exploitation trade-off can be proposed (habits vs goal-directed behaviour, reward-driven vs automatism, etc.). They are not ruled out by the experimental results (even if we take them for granted despite (i).

Read the original source
eLife
Aug 11, 2022
Reviewer #3 (Public Review):

The manuscript examines the neural bases of the exploration/exploitation tradeoff - a crucial component of decision-making, that determines whether we choose the best option or explore less beneficial, but perhaps more informative alternatives. The authors specifically focus on the role of a substructure of the basal ganglia (the globus pallidus internus, or GPi) in modulating the amount of exploration in a simple learning task. This is a straightforward, well-designed study - albeit with a small patient sample, as is often the case in clinical data involving deep brain stimulation - and the computational modelling is rigorous. The presented work convincingly argues for the role of the GPi in suppressing exploration and enhancing exploitative choices.

Strengths of the present work
1. Testing DBS patients is a …
Reviewer #3 (Public Review):

The manuscript examines the neural bases of the exploration/exploitation tradeoff - a crucial component of decision-making, that determines whether we choose the best option or explore less beneficial, but perhaps more informative alternatives. The authors specifically focus on the role of a substructure of the basal ganglia (the globus pallidus internus, or GPi) in modulating the amount of exploration in a simple learning task. This is a straightforward, well-designed study - albeit with a small patient sample, as is often the case in clinical data involving deep brain stimulation - and the computational modelling is rigorous. The presented work convincingly argues for the role of the GPi in suppressing exploration and enhancing exploitative choices.

Strengths of the present work

Testing DBS patients is a somewhat rare opportunity to directly observe the impact of stimulating or inactivating specific neural areas on human behavior. The present task's pallidal-DPS cohort and the ON/OFF stimulation manipulation make for a strong argument that the observed differences in behavior and model parameters are indeed due to the GPi, and the author's proposed neural framework for how the GPi modulates exploration is well-supported and convincing.

The computational modelling is rigorous; the authors have shown how their selected model complements the data and model-free analyses, as well as conducted posterior predictive checks to test the extent to which recovered model parameters are actually informative.

This line of investigation is always relevant and timely, as most daily decisions from small-scale human decisions to large-scale AI machines involve calibrating exploration and exploitation in some form. Further insight into the neural mechanisms of this tradeoff, therefore, holds significance and countless potential applications.

Other Comments

While historically, 'exploration' was simply defined - as in the present work - as simply choosing the non-greedy/non-maximizing option, in the past decade or so more recent work has crucially distinguished between types of exploration that are explicitly aimed at seeking new information (i.e. directed exploration - specifically choosing the options that are less well-known, in order to build a more accurate world representation) and those that are independent of the informativeness or other properties of the other choice options (i.e. decision noise). Existing literature provides evidence for separate neural substrates for the two, and any model that will enrich our understanding of how the brain calibrates the explore/exploit tradeoff should at least touch on how these separate types of exploration fit into the proposed framework. It would therefore help contextualize and strengthen the presented work to include more discussion on precisely which type of exploration the GPi is modulating.

While the proposed model is well-presented and checked, some further clarification for readers who are not familiar with RLDDM might improve clarity. Furthermore, the model-free performance analyses as well as the brain connectivity analyses, while they clearly show a link between GPi stimulation and the overall amount of exploration, do not delve too deeply into the specific patterns of the exploratory behavior (e.g. by showing within-task fluctuations through a moving window of average exploration, or by describing further the differences in decision time between explore and exploit trials, etc.). The basic performance analyses are consistent with the authors' hypotheses and support the conclusions, but a more in-depth check of specific exploration patterns might help clarify the mechanism better.
Read the original source
Version published to 10.1101/2022.04.21.489010 on bioRxiv
Apr 22, 2022

Understanding human meta-control and its pathologies using deep neural networks

This article has 3 authors:
1. Kai Jappe Sandbrink
2. Laurence Hunt
3. Christopher Summerfield
This article has no evaluationsLatest version Jan 30, 2026
Intro to Brain-Like-AGI Safety

This article has 1 author:
1. Steven J. Byrnes
This article has no evaluationsLatest version Jan 23, 2026
The entropic brain today

This article has 1 author:
1. Robin Carhart-Harris
This article has no evaluationsLatest version Dec 12, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Understanding human meta-control and its pathologies using deep neural networks

Intro to Brain-Like-AGI Safety

The entropic brain today