Flexible control of representational dynamics in a disinhibition-based model of decision-making

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This work provides a promising first pass at providing an integrative model for how decisions arise from neural circuits. The approach is novel but lacks a more rigorous vetting against alternative model formulations to be able to determine its true significance. More stringent evaluations of the model in the context of existing work, as well as a clearer description of the goals and implementation of the approach, would help to address these concerns.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Inhibition is crucial for brain function, regulating network activity by balancing excitation and implementing gain control. Recent evidence suggests that beyond simply inhibiting excitatory activity, inhibitory neurons can also shape circuit function through disinhibition. While disinhibitory circuit motifs have been implicated in cognitive processes, including learning, attentional selection, and input gating, the role of disinhibition is largely unexplored in the study of decision-making. Here, we show that disinhibition provides a simple circuit motif for fast, dynamic control of network state and function. This dynamic control allows a disinhibition-based decision model to reproduce both value normalization and winner-take-all dynamics, the two central features of neurobiological decision-making captured in separate existing models with distinct circuit motifs. In addition, the disinhibition model exhibits flexible attractor dynamics consistent with different forms of persistent activity seen in working memory. Fitting the model to empirical data shows it captures well both the neurophysiological dynamics of value coding and psychometric choice behavior. Furthermore, the biological basis of disinhibition provides a simple mechanism for flexible top-down control of the network states, enabling the circuit to capture diverse task-dependent neural dynamics. These results suggest a biologically plausible unifying mechanism for decision-making and emphasize the importance of local disinhibition in neural processing.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    This work presents a unification model (of sorts) for explaining how the flow of evidence through networks can be controlled during decision-making. The authors combine two general frameworks previously used as neural models of cortical decision-making, dynamic normalization (that implement value encoding via firing activity) and recurrent network models (which capture winner-take-all selection processes) into a unified model called the local disinhibition-based decision model (LDDM). The simple motif of the LDDM allows for the disinhibition of excitatory cells that represent the engagement of individual actions that happens through a recurrent inhibitory loop (i.e., a leaky competing accumulator). The authors show how the LDDM works effectively well at explaining both decision dynamics and the properties of cortical cells during perceptual decision-making tasks.

    All in all, I thought this was an interesting study with an ambitious goal. But like any good study, there are some open issues worth noting and correcting.

    MAJOR CONCERNS

    1. Big picture

    This was a comprehensive and extremely well-vetted set of theoretical experiments. However, the scope and complexity also made the take-home message hard to discern. The abstract and most of the introduction focus on the framing of LDDM as a hybrid of dynamic normalization models (DNM) and recurrent network models (RNMs). This is sold as a unification of value normalization and selection into a novel unified framework. Then the focus shifts to the role of disinhibition in decision-making. Then in the Discussion, the goal is stated as to determine whether the LDDM generates persistent activity and does this activity differ from RNMs. As a reader, it seems like the paper jumps between two high- level goals: 1) the unification of DNM and RNM architectures, and 2) the role of disinhibition. This constant changing makes it hard to focus as the reader goes on. So what is the big picture goal specifically?

    Also, the framing of value normalization and WTA as a novel computational goal is a bit odd as this is a major focus of the field of reinforcement learning (both abstractly at the computational level and more concretely in models of the circuits that regulate it). I know that the authors do not think they are the first to unify value judgements with selection criteria. The writing just comes across that way and should be clarified.

    We thank the Reviewer for their thoughtful consideration of the overall framing of the big picture goals of the paper. Upon reflection, we agree that the paper really centers on the importance of incorporating disinhibition into computational circuit-based models of decision-making. Thus, we have significantly revised the Introduction and Discussion to focus on the theoretical and empirical importance of incorporating disinhibition into computational models of decision-making, and use the integration of value normalization and WTA selection as an example of how disinhibition increases the richness of circuit decision models. Please see the response to recommendations below for more detail on the changes.

    1. Link to other models

    The LDDM is described as a novel unification of value normalization and winner-take-all (WTA) selection, combining value processing and selection. While the authors do an excellent job of referencing a significant chunk of the decision neuroscience literature (160 references!) the motif they end up designing has a highly similar structure to a well-known neural circuit linked to decision-making: the cortico-basal ganglia pathways. Extensive work over the past 20+ years has highlighted how cortical-basal ganglia loops work via disinhibition of cortical decision units in a similar way as the LDDM (see the work by Michael Frank, Wei Wei, Jonathan Rubin, Fred Hamker, Rafal Bogacz, and many others). It was surprising to not see this link brought up in the paper as most of the framing was on the possibility of the LDDM representing cortical motifs, yet as far as I know, there does not exist evidence for such architectures in the cortex, but there is in these cortical-basal ganglia systems.

    We thank the Reviewer for the suggestion to link the LDDM to disinhibition in CBG models; this is indeed an important body of empirical and computational work that we overlooked in the original manuscript. We have now added text to the Discussion to highlight the link between LDDM and these CBL disinhibition models, focusing on how they are conceptually similar and how they differ. Please see our response to recommendations below for a more detailed discussion of the revisions.

    1. Model evaluations

    The authors do a great job of extensively probing the LDDM under different conditions and against some empirical data. However, most of the time there is no "control" model or current state-of-the-art model that the LDDM is being compared against. In a few of the simulation experiments, the LDDM is compared against the DNM and RNM alone, so as to show how the two components of the LDDM motif compare against the holistic model itself. But this component model comparison is inconsistently used across simulation experiments.

    Also, it is worth asking whether the DNM and RNM are appropriate comparison models to vet the LDDM against for two reasons. First, these are the components of the full LDDM. So these tests show us how the two underlying architectural systems that go into LDDM perform independently, but not necessarily how the LDDM compares against other architectures without these features. Second, as pointed out in my previous comment, the LDDM is a more complex model, with more parameters, than either the DNM or RNM. The field of decision neuroscience is awash in competing decision models (including probabilistic attractor models, non-recurrent integrators, etc.). If we really want to understand the utility of the LDDM, it would be good to know how it performs against similarly complex models, as opposed to its two underlying component models.

    We greatly appreciate the Reviewer’s comments on the point of model comparison, which points out that our original manuscript failed to clearly convey a very important difference between the LDDM and the existing RNM(s). In the revision, we now make it clearer that the fundamental difference between the LDDM and the RNMs is the architecture of disinhibition (see the revised Introduction, especially p. 8 lines 164-168). The LDDM is not simply a combination of the DNM model with RNM architecture (a point we may have mistakenly conveyed in the original manuscript): the introduction of disinhibition separates LDDM inhibition into option-selective subpopulations, as opposed to the single pooled inhibition of RNM models. Given this fact, the LDDM predicts unique selectiveinhibition dynamics shown in recent optogenetic and calcium imaging results, a finding inconsistent with the common-pooled and non-selective inhibition assumed in the existing RNMs and many of its variants. Thus, we believe that a comparison between the LDDM and the RNM, which share similar level of complexity and numbers of parameters, is important.

    We also appreciated the Reviewer’s concern about testing the LDDM against alternative models. In order to better connect to the existing literature, we now compare the LDDM to another standard circuit model of decision-making - the leaky competing accumulator (LCA) model. The LCA is a circuit model that captures many of the aspects of perceptual decision-making seen in the mathematical drift diffusion model (DDM), but with a construction that allows for fitting to behavioral data and comparison of underlying unit activities. Please see our response to recommendations below for further detail.

    1. Comparison to physiological data

    I quite enjoyed the comparisons of the excitatory cell activity to empirical data from the Shadlen lab experiments. However, these were largely qualitative in nature. In conjunction with my prior point on the models that the LDDM is being compared against, it would be ideal to have a direct measure of model fits that can be used to compare the performance of different competing "control" models. These measures would have to account for differences in model complexity (e.g., AIC or BIC), but such an analysis would help the reader understand the utility of the LDDM in connecting with empirical data much better.

    We agree with the Reviewer that a quantitative comparison of the match between model neural predictions and empirical neurophysiological data is important. First, we wish to clarify that the model neural predictions are simulated from models fit to the behavioral (choice and RT data), not from fits to the neural activity traces – a point we now clarify in the text. While directly fitting dynamic models (LDDM, RNM, or LCA) to the neurophysiological data is appealing, there are currently several obstacles to this approach. The first problem is the complexity of the dynamic neural traces. Despite the long history of the random-dot motion paradigm, detailed features of the dynamics are still not understood. For example, the stereotyped initial dip after stimulus onset may reflect a reset of the network state to improve signal to noise ratio (Conen and Padoa-Schioppa, 2015) or simply reflect a surround suppression-like lateral inhibition in visual processing. A second problem is that the primary difference between the models is the activity of inhibitory (and disinhibitory) neurons, which are typically not recorded in neurophysiological experiments; thus, there is a lack of empirical data to which to fit the models. In the revision, we clarified that the model fitting to the Roitman & Shadlen data is for behavioral data only, and model unit activity traces are derived from models fit to behavioral data.

    That being said, we agree that a quantitative comparison of model activity predictions is helpful. Because the models are fit not to the neural data but to the behavioral data, rather than using likelihood-based measures like AIC and BIC we used a simple RMSE measure to compare the match between predicted and neural activity patterns (revised Fig. 6E, Fig 6-S4E, Fig 6-S5E). Please see response to recommendations below for details.

    Reviewer #2 (Public Review):

    The aim of this article was to create a biologically plausible model of decision-making that can both represent a choice's value and reproduce winner-take-all ramping behavior that determines the choice, two fundamental components of value- based decision-making. Both of these aspects have been studied and modeled independently but empirical studies have found that single neurons can switch between both of the aspects (i.e., from representing value to winner-take-all ramping behavior) in ways that are not well described by current biological plausible models of decision making.

    The current article provides a thorough investigation of a new model (the local disinhibition decision model; LDDM) that has the goal of combining value representations and winner-takes-all ramping dynamics related to choice. Their model uses biologically plausible disinhibition to control the levels of inhibition in a local network of simulated neurons. Through a careful series of simulation experiments, they demonstrate that their network can first represent the value of different options, then switch to winner-takes-all ramping dynamics when a choice needs to be made. They further demonstrate that their single model reproduces key components of value-based and winner-takes-all dynamics found in both neural and behavioral data. They additionally conduct simulation studies to demonstrate that recurrent excitatory properties in their network produce value-persistence behavior that could be related to memory. They end by conducting a careful simulation study of the influence of GABA agonists that provide clear and testable predictions of their proposed role of inhibition in the neural processes that underlie decision-making. This last piece is especially important as it provides a clear set of predictions and experiments to help support or falsify their model.

    There are overall many strengths to this paper. As the authors note, current network models do not explain both value- based and ramping-like decision-making properties. Their thorough simulation studies and their validation against empirical neural and behavioral data will be of strong interest to neuroscientists and psychologists interested in value- based decision-making. The simulations related to persistence and the GABA-agonist experiments they propose also provide very clear guidelines for future research that would help advance the field of decision-making research.

    Although the methods and model were generally clear, there was a fair amount of emphasis on the role of recurrence in the LDDM, but very little evidence that recurrence was important or necessary for any of the empirical data examined. The authors do demonstrate the importance of recurrence in some of their simulation studies (particularly in their studies of persistence), but these would need to be compared against empirical data to be validated. Nevertheless, the model and thorough simulation investigations will likely help develop more precise theories of value-based decision-making.

    We appreciate the Reviewer’s thoughtful comments. These comments - especially about anatomic recurrence and its relationship to the parameter 𝛼 - inspired us to think more about the uniqueness of the current circuit to others, especially the implications related to the parameters 𝛼 (i.e., self-excitation) and 𝛽 (i.e., local disinhibition). Recurrence is required to drive winner-take-all competition in the standard RNM of decision-making. However, we show here with both analytical and numerical approaches that recurrence helps WTA competition but is not necessary in our model. Instead, the key feature of the LDDM is to utilize disinhibition in conjunction with lateral inhibition to realize winner-take-all competition. That leads to many different predictions of the current model from the existing models, such as selective inhibition and flexible control of dynamics.

    In response to the Reviewer’s points and after careful consideration of the differential equations, we realized that in our model fitting, the 𝛼 parameter fitting to zero does not necessarily mean recurrence should be zero. The 𝛼 parameter shares a lot of similarity to the baseline gain control (parameter BG in our revision), and thus is unidentifiable in the current dataset. In the interest of parsimony, we did not include the parameter BG in the original manuscript, but now include it because it reveals the difficulty of interpreting fit 𝛼 values as simply the level of recurrence.

    Overall, disinhibition (𝛽) in the LDDM is required for WTA activity while recurrence (𝛼) can contribute but is not necessary; however, 𝛼 is theoretically important for generating persistent activity, with the caveat that in the current framework there is an unclear relationship between fit 𝛼 and recurrence. Regardless, we agree that the contribution of 𝛼 to the LDDM framework is worth further testing and examining with future empirical data.

    Reviewer #3 (Public Review):

    Shen et al. attempt to reconcile two distinct features of neural responses in frontoparietal areas during perceptual and value-guided decision-making into a single biologically realistic circuit model. First, previous work has demonstrated that value coding in the parietal cortex is relative (dependent on the value of all available choice options) and that this feature can be explained by divisive normalization, implemented using adaptive gain control in a recurrently connected circuit model (Louie et al, 2011). Second, a wealth of previous studies on perceptual decision-making (Gold & Shadlen 2007) have provided strong evidence that competitive winner-take-all dynamics implemented through recurrent dynamics characterized by mutual inhibition (Wang 2008) can account for categorical choice coding. The authors propose a circuit model whose key feature is the flexible gating of 'disinhibition', which captures both types of computation - divisive normalization and winner-take-all competition. The model is qualitatively able to explain the 'early' transients in parietal neural responses, which show signatures of divisive normalization indicating a relative value code, persistent activity during delay periods, and 'late' accumulation-to-bound type categorical responses prior to the report of choice/action onset.

    The attempt to integrate these two sets of findings by a unified circuit model is certainly interesting and would be useful to those who seek a tighter link between biologically realistic recurrent neural network models and neural recordings. I also appreciate the effort undertaken by the authors in using analytical tools to gain an understanding of the underlying dynamical mechanism of the proposed model. However, I have two major concerns. First, the manuscript in its current form lacks sufficient clarity, specifically in how some of the key parameters of the model are supposed to be interpreted (see point 1 below). Second, the authors overlook important previous work that is closely related to the ideas that are being presented in this paper (see point 2 below).

    1. The behavior of the proposed model is critically dependent on a single parameter 'beta' whose value, the authors claim, controls the switch from value-coding to choice-coding. However, the precise definition/interpretation of 'beta' seems inconsistent in different parts of the text. I elaborate on this issue in sub-points (1a-b) below:

    1a). For instance, in the equations of the main text (Equations 1-3), 'beta' is used to denote the coupling from the excitatory units (R) to the disinhibitory units (D) in Equations 1-3. However, in the main figures (Fig 2) and in the methods (Equation 5-8), 'beta' is instead used to refer to the coupling between the disinhibitory (D) and the inhibitory gain control units (G). Based on my reading of the text (and the predominant definition used by the authors themselves in the main figures and the methods), it seems that 'beta' should be the coupling between the D and G units.

    1b). A more general and critical issue is the failure to clearly specify whether this coupling of D-G units (parameterized by 'beta') should be interpreted as a 'functional' one, or an 'anatomical' one. A straightforward interpretation of the model equations (Equations 5-8) suggests that 'beta' is the synaptic weight (anatomical coupling) between the D and G units/populations. However, significant portions of the text seem to indicate otherwise (i.e a 'functional' coupling). I elaborate on this in subpoints (i-iii) below:

    (1b-i). One of the main claims of the paper is that the value of 'beta' is under 'external' top-down control (Figure 2 caption, lines 124-126). When 'beta' equals zero, the model is consistent with the previous DNM model (dynamic normalization, Louie et al 2011), but for moderate/large non-zero values of 'beta', the network exhibits WTA dynamics. If 'beta' is indeed the anatomical coupling between D and G (as suggested by the equations of the model), then, are we to interpret that the synaptic weight between D-G is changed by the top-down control signal within a trial? My understanding of the text suggests that this is not in fact the case. Instead, the authors seem to want to convey that top-down input "functionally" gates the activity of D units. When the top-down control signal is "off", the disinhibitory units (D) are "effectively absent" (i.e their activity is clamped at zero as in the schematic in Fig 2B), and therefore do not drive the G units. This would in- turn be equivalent to there being no "anatomical coupling" between D and G. However when the top-down signal is "on", D units have non-zero activity (schematic in Fig 2B), and therefore drive the G units, ultimately resulting in WTA-like dynamics.

    (1b-ii). Therefore, it seems like when the authors say that beta equals zero during the value coding phase they are almost certainly referring to a functional coupling from D to G, or else it would be inconsistent with their other claim that the proposed model flexibly reconfigures dynamics only through a single topdown input but without a change to the circuit architecture (reiterated in lines 398-399, 442-444, 544-546, 557-558, 579-590). However, such a 'functional' definition of 'beta' would seem inconsistent with how it should actually be interpreted based on the model equations, and also somewhat misleading considering the claim that the proposed network is a biologically realistic circuit model.

    (1b-iii). The only way to reconcile the results with an 'anatomical' interpretation of 'beta' is if there is a way to clamp the values of the 'D' units to zero when the top-down control signal is 'off'. Considering that the D units also integrate feed- forward inputs from the excitatory R units (Fig 2, Equations 1-3 or 5-8), this can be achieved either via a non-linearity, or if the top-down control input multiplicatively gates the synapse (consistent with the argument made in lines 115-116 and 585-586 that this top-down control signal is 'neuromodulatory' in nature). Neither of these two scenarios seems to be consistent with the basic definition of the model (Equations 1-3), which therefore confirms my suspicion that the interpretation of 'beta' being used in the text is more consistent with a 'functional' coupling from D to G.

    We thank the reviewer for pointing out this confusion. We apologize that the original illustrations (Fig. 2A) and the differential equations in Methods (Eqs. 5-8) did not convey very well our ideas. 𝛽 is intended to reference the coupling from R to D, not a change in the weights between D and G units. We realize there was some confusion on this part due to inconsistency between our original figures, text, and supplementary material.

    Given the lack of clarity in the previous version as well as the Reviewer’s questions, we now emphasize that 𝛽 represents a functional coupling between the R and D neurons. The biological assumption of the disinhibitory architecture is built based on recent findings that VIP neurons in the cortex always inhibit other neighboring inhibitory cells, such as SST and PV neurons, and consequently disinhibit the neighboring primary neurons (e.g., Fu et al., 2014; Karnani et al., 2014, 2016). We did not see evidence in the literature of fast-changing (anatomic) connections between VIP and SST/PV. However, there is evidence that the responsiveness of VIP neurons to excitatory neurons can be modulated by changing the concentrations of neuromodulators, such as acetylcholine and serotonin (Prönneke et al., 2020). While the stereotype of neuromodulator action is slow dynamics, recent findings show that for example basal forebrain cholinergic neurons respond to reward and punishment with surprising speed and precision (18 ± 3ms) (Hangya et al., 2015) to modulate arousal, attention, and learning in the neocortex. Given the large number of studies that identify long-term projections and neuromodulatory inputs to VIP neurons (e.g., Pfeffer et al., 2013; Pi et al., 2013; Alitto & Dan, 2013; Tremblay et al., 2016), we believe that it will be more plausible to assume the connection weights between R and D in our case is quickly modulated within a trial.

    To clarify this issue in the revised manuscript, we made the following corrections:

    1. We repositioned the 𝛽 parameter in Fig. 2A between the connection from R to D, to align the description of 𝛽 modulating R to D in the main text.

    2. We modified the differential equations 5-8 (now numbered as Eqs. 28-32) in Methods (pp. 61) to include the disinhibitory unit D as an independent control from the inhibitory unit I, in order to be consistent with the disinhibitory D units in LDDM. Such a change makes tiny differences in the model predictions (please see dynamics simulated after the change in Fig. 2-figure supplement 1B).

    3. We updated the neural circuit motif in Fig. 2 -figure supplement 1A accordingly.

    1. The main contribution of the manuscript is to integrate the characteristics of the dynamic normalization model (Louie et al, 2011) and the winner-take-all behavior of recurrent circuit models that employ mutual inhibition (Wang, 2008), into a circuit motif that can flexibly switch between these two computations. The main ingredient for achieving this seems to be the dynamical 'gating' of the disinhibition, which produces a switch in the dynamics, from point-attractor-like 'stable' dynamics during value coding to saddle-point-like 'unstable' dynamics during categorical choice coding. While the specific use of disinhibition to switch between these two computations is new, the authors fail to cite previous work that has explored similar ideas that are closely related to the results being presented in their study. It would be very useful if the authors can elaborate on the relationship between their work and some of these previous studies. I elaborate on this point in (a-b) below:

    2a) While the authors may be correct in claiming that RNM models based on mutual inhibition are incapable of relative value coding, it has already been shown previously that RNM models characterized by mutual inhibition can be flexibly reconfigured to produce dynamical regimes other than those that just support WTA competition (Machens, Romo & Brody, 2005). Similar to the behavior of the proposed model (Fig 9), the model by Machens and colleagues can flexibly switch between point-attractor dynamics (during stimulus encoding), line-attractor dynamics (during working memory), and saddle-point dynamics (during categorical choice) depending on the task epoch. It achieves this via a flexible reconfiguration of the external inputs to the RNM. Therefore, the authors should acknowledge that the mechanism they propose may just be one of many potential ways in which a single circuit motif is reconfigured to produce different task dynamics. This also brings into question their claim that the type of persistent activity produced by the model is "novel", which I don't believe it is (see Machens et al 2005 for the same line-attractor-based mechanism for working memory)

    We thank the Reviewer for pointing out the conceptual similarities between the LDDM and the Machens Romo Brody model, and now include a discussion of the link between the two early in the revised Discussion (p. 38, lines 826-837). Please see response to recommendations below for a more detailed discussion of this point.

    2b) The authors also fail to cite or describe their work in relation to previous work that has used disinhibition-based circuit motifs to achieve all 3 proposed functions of their model - (i) divisive normalization (Litwin-Kumar et al, 2016), (ii) flexible gating/decision making (Yang et al, 2016), and working memory maintenance (Kim & Sejnowski,2021)

    The Reviewer notes several relevant papers, and we have now discussed them and their relationship to the LDDM in a revised Discussion section (pp. 35-36). Please see response to recommendations below for a more details.

  2. eLife assessment

    This work provides a promising first pass at providing an integrative model for how decisions arise from neural circuits. The approach is novel but lacks a more rigorous vetting against alternative model formulations to be able to determine its true significance. More stringent evaluations of the model in the context of existing work, as well as a clearer description of the goals and implementation of the approach, would help to address these concerns.

  3. Reviewer #1 (Public Review):

    This work presents a unification model (of sorts) for explaining how the flow of evidence through networks can be controlled during decision-making. The authors combine two general frameworks previously used as neural models of cortical decision-making, dynamic normalization (that implement value encoding via firing activity) and recurrent network models (which capture winner-take-all selection processes) into a unified model called the local disinhibition-based decision model (LDDM). The simple motif of the LDDM allows for the disinhibition of excitatory cells that represent the engagement of individual actions that happens through a recurrent inhibitory loop (i.e., a leaky competing accumulator). The authors show how the LDDM works effectively well at explaining both decision dynamics and the properties of cortical cells during perceptual decision-making tasks.

    All in all, I thought this was an interesting study with an ambitious goal. But like any good study, there are some open issues worth noting and correcting.

    MAJOR CONCERNS

    1. Big picture

    This was a comprehensive and extremely well-vetted set of theoretical experiments. However, the scope and complexity also made the take-home message hard to discern. The abstract and most of the introduction focus on the framing of LDDM as a hybrid of dynamic normalization models (DNM) and recurrent network models (RNMs). This is sold as a unification of value normalization and selection into a novel unified framework. Then the focus shifts to the role of disinhibition in decision-making. Then in the Discussion, the goal is stated as to determine whether the LDDM generates persistent activity and does this activity differ from RNMs. As a reader, it seems like the paper jumps between two high-level goals: 1) the unification of DNM and RNM architectures, and 2) the role of disinhibition. This constant changing makes it hard to focus as the reader goes on. So what is the big picture goal specifically?

    Also, the framing of value normalization and WTA as a novel computational goal is a bit odd as this is a major focus of the field of reinforcement learning (both abstractly at the computational level and more concretely in models of the circuits that regulate it). I know that the authors do not think they are the first to unify value judgements with selection criteria. The writing just comes across that way and should be clarified.

    2. Link to other models

    The LDDM is described as a novel unification of value normalization and winner-take-all (WTA) selection, combining value processing and selection. While the authors do an excellent job of referencing a significant chunk of the decision neuroscience literature (160 references!) the motif they end up designing has a highly similar structure to a well-known neural circuit linked to decision-making: the cortico-basal ganglia pathways. Extensive work over the past 20+ years has highlighted how cortical-basal ganglia loops work via disinhibition of cortical decision units in a similar way as the LDDM (see the work by Michael Frank, Wei Wei, Jonathan Rubin, Fred Hamker, Rafal Bogacz, and many others). It was surprising to not see this link brought up in the paper as most of the framing was on the possibility of the LDDM representing cortical motifs, yet as far as I know, there does not exist evidence for such architectures in the cortex, but there is in these cortical-basal ganglia systems.

    3. Model evaluations

    The authors do a great job of extensively probing the LDDM under different conditions and against some empirical data. However, most of the time there is no "control" model or current state-of-the-art model that the LDDM is being compared against. In a few of the simulation experiments, the LDDM is compared against the DNM and RNM alone, so as to show how the two components of the LDDM motif compare against the holistic model itself. But this component model comparison is inconsistently used across simulation experiments.

    Also, it is worth asking whether the DNM and RNM are appropriate comparison models to vet the LDDM against for two reasons. First, these are the components of the full LDDM. So these tests show us how the two underlying architectural systems that go into LDDM perform independently, but not necessarily how the LDDM compares against other architectures without these features. Second, as pointed out in my previous comment, the LDDM is a more complex model, with more parameters, than either the DNM or RNM. The field of decision neuroscience is awash in competing decision models (including probabilistic attractor models, non-recurrent integrators, etc.). If we really want to understand the utility of the LDDM, it would be good to know how it performs against similarly complex models, as opposed to its two underlying component models.

    4. Comparison to physiological data

    I quite enjoyed the comparisons of the excitatory cell activity to empirical data from the Shadlen lab experiments. However, these were largely qualitative in nature. In conjunction with my prior point on the models that the LDDM is being compared against, it would be ideal to have a direct measure of model fits that can be used to compare the performance of different competing "control" models. These measures would have to account for differences in model complexity (e.g., AIC or BIC), but such an analysis would help the reader understand the utility of the LDDM in connecting with empirical data much better.

  4. Reviewer #2 (Public Review):

    The aim of this article was to create a biologically plausible model of decision-making that can both represent a choice's value and reproduce winner-take-all ramping behavior that determines the choice, two fundamental components of value-based decision-making. Both of these aspects have been studied and modeled independently but empirical studies have found that single neurons can switch between both of the aspects (i.e., from representing value to winner-take-all ramping behavior) in ways that are not well described by current biological plausible models of decision making.

    The current article provides a thorough investigation of a new model (the local disinhibition decision model; LDDM) that has the goal of combining value representations and winner-takes-all ramping dynamics related to choice. Their model uses biologically plausible disinhibition to control the levels of inhibition in a local network of simulated neurons. Through a careful series of simulation experiments, they demonstrate that their network can first represent the value of different options, then switch to winner-takes-all ramping dynamics when a choice needs to be made. They further demonstrate that their single model reproduces key components of value-based and winner-takes-all dynamics found in both neural and behavioral data. They additionally conduct simulation studies to demonstrate that recurrent excitatory properties in their network produce value-persistence behavior that could be related to memory. They end by conducting a careful simulation study of the influence of GABA agonists that provide clear and testable predictions of their proposed role of inhibition in the neural processes that underlie decision-making. This last piece is especially important as it provides a clear set of predictions and experiments to help support or falsify their model.

    There are overall many strengths to this paper. As the authors note, current network models do not explain both value-based and ramping-like decision-making properties. Their thorough simulation studies and their validation against empirical neural and behavioral data will be of strong interest to neuroscientists and psychologists interested in value-based decision-making. The simulations related to persistence and the GABA-agonist experiments they propose also provide very clear guidelines for future research that would help advance the field of decision-making research.

    Although the methods and model were generally clear, there was a fair amount of emphasis on the role of recurrence in the LDDM, but very little evidence that recurrence was important or necessary for any of the empirical data examined. The authors do demonstrate the importance of recurrence in some of their simulation studies (particularly in their studies of persistence), but these would need to be compared against empirical data to be validated. Nevertheless, the model and thorough simulation investigations will likely help develop more precise theories of value-based decision-making.

  5. Reviewer #3 (Public Review):

    Shen et al. attempt to reconcile two distinct features of neural responses in frontoparietal areas during perceptual and value-guided decision-making into a single biologically realistic circuit model. First, previous work has demonstrated that value coding in the parietal cortex is relative (dependent on the value of all available choice options) and that this feature can be explained by divisive normalization, implemented using adaptive gain control in a recurrently connected circuit model (Louie et al, 2011). Second, a wealth of previous studies on perceptual decision-making (Gold & Shadlen 2007) have provided strong evidence that competitive winner-take-all dynamics implemented through recurrent dynamics characterized by mutual inhibition (Wang 2008) can account for categorical choice coding. The authors propose a circuit model whose key feature is the flexible gating of 'disinhibition', which captures both types of computation - divisive normalization and winner-take-all competition. The model is qualitatively able to explain the 'early' transients in parietal neural responses, which show signatures of divisive normalization indicating a relative value code, persistent activity during delay periods, and 'late' accumulation-to-bound type categorical responses prior to the report of choice/action onset.

    The attempt to integrate these two sets of findings by a unified circuit model is certainly interesting and would be useful to those who seek a tighter link between biologically realistic recurrent neural network models and neural recordings. I also appreciate the effort undertaken by the authors in using analytical tools to gain an understanding of the underlying dynamical mechanism of the proposed model. However, I have two major concerns. First, the manuscript in its current form lacks sufficient clarity, specifically in how some of the key parameters of the model are supposed to be interpreted (see point 1 below). Second, the authors overlook important previous work that is closely related to the ideas that are being presented in this paper (see point 2 below).

    1. The behavior of the proposed model is critically dependent on a single parameter 'beta' whose value, the authors claim, controls the switch from value-coding to choice-coding. However, the precise definition/interpretation of 'beta' seems inconsistent in different parts of the text. I elaborate on this issue in sub-points (1a-b) below:

    1a). For instance, in the equations of the main text (Equations 1-3), 'beta' is used to denote the coupling from the excitatory units (R) to the disinhibitory units (D) in Equations 1-3. However, in the main figures (Fig 2) and in the methods (Equation 5-8), 'beta' is instead used to refer to the coupling between the disinhibitory (D) and the inhibitory gain control units (G). Based on my reading of the text (and the predominant definition used by the authors themselves in the main figures and the methods), it seems that 'beta' should be the coupling between the D and G units.

    1b). A more general and critical issue is the failure to clearly specify whether this coupling of D-G units (parameterized by 'beta') should be interpreted as a 'functional' one, or an 'anatomical' one. A straightforward interpretation of the model equations (Equations 5-8) suggests that 'beta' is the synaptic weight (anatomical coupling) between the D and G units/populations. However, significant portions of the text seem to indicate otherwise (i.e a 'functional' coupling). I elaborate on this in subpoints (i-iii) below:

    (1b-i). One of the main claims of the paper is that the value of 'beta' is under 'external' top-down control (Figure 2 caption, lines 124-126). When 'beta' equals zero, the model is consistent with the previous DNM model (dynamic normalization, Louie et al 2011), but for moderate/large non-zero values of 'beta', the network exhibits WTA dynamics. If 'beta' is indeed the anatomical coupling between D and G (as suggested by the equations of the model), then, are we to interpret that the synaptic weight between D-G is changed by the top-down control signal within a trial? My understanding of the text suggests that this is not in fact the case. Instead, the authors seem to want to convey that top-down input "functionally" gates the activity of D units. When the top-down control signal is "off", the disinhibitory units (D) are "effectively absent" (i.e their activity is clamped at zero as in the schematic in Fig 2B), and therefore do not drive the G units. This would in-turn be equivalent to there being no "anatomical coupling" between D and G. However when the top-down signal is "on", D units have non-zero activity (schematic in Fig 2B), and therefore drive the G units, ultimately resulting in WTA-like dynamics.

    (1b-ii). Therefore, it seems like when the authors say that beta equals zero during the value coding phase they are almost certainly referring to a functional coupling from D to G, or else it would be inconsistent with their other claim that the proposed model flexibly reconfigures dynamics only through a single top-down input but without a change to the circuit architecture (reiterated in lines 398-399, 442-444, 544-546, 557-558, 579-590). However, such a 'functional' definition of 'beta' would seem inconsistent with how it should actually be interpreted based on the model equations, and also somewhat misleading considering the claim that the proposed network is a biologically realistic circuit model.

    (1b-iii). The only way to reconcile the results with an 'anatomical' interpretation of 'beta' is if there is a way to clamp the values of the 'D' units to zero when the top-down control signal is 'off'. Considering that the D units also integrate feed-forward inputs from the excitatory R units (Fig 2, Equations 1-3 or 5-8), this can be achieved either via a non-linearity, or if the top-down control input multiplicatively gates the synapse (consistent with the argument made in lines 115-116 and 585-586 that this top-down control signal is 'neuromodulatory' in nature). Neither of these two scenarios seems to be consistent with the basic definition of the model (Equations 1-3), which therefore confirms my suspicion that the interpretation of 'beta' being used in the text is more consistent with a 'functional' coupling from D to G.

    1. The main contribution of the manuscript is to integrate the characteristics of the dynamic normalization model (Louie et al, 2011) and the winner-take-all behavior of recurrent circuit models that employ mutual inhibition (Wang, 2008), into a circuit motif that can flexibly switch between these two computations. The main ingredient for achieving this seems to be the dynamical 'gating' of the disinhibition, which produces a switch in the dynamics, from point-attractor-like 'stable' dynamics during value coding to saddle-point-like 'unstable' dynamics during categorical choice coding. While the specific use of disinhibition to switch between these two computations is new, the authors fail to cite previous work that has explored similar ideas that are closely related to the results being presented in their study. It would be very useful if the authors can elaborate on the relationship between their work and some of these previous studies. I elaborate on this point in (a-b) below:

    2a) While the authors may be correct in claiming that RNM models based on mutual inhibition are incapable of relative value coding, it has already been shown previously that RNM models characterized by mutual inhibition can be flexibly reconfigured to produce dynamical regimes other than those that just support WTA competition (Machens, Romo & Brody, 2005). Similar to the behavior of the proposed model (Fig 9), the model by Machens and colleagues can flexibly switch between point-attractor dynamics (during stimulus encoding), line-attractor dynamics (during working memory), and saddle-point dynamics (during categorical choice) depending on the task epoch. It achieves this via a flexible reconfiguration of the external inputs to the RNM. Therefore, the authors should acknowledge that the mechanism they propose may just be one of many potential ways in which a single circuit motif is reconfigured to produce different task dynamics. This also brings into question their claim that the type of persistent activity produced by the model is "novel", which I don't believe it is (see Machens et al 2005 for the same line-attractor-based mechanism for working memory)

    2b) The authors also fail to cite or describe their work in relation to previous work that has used disinhibition-based circuit motifs to achieve all 3 proposed functions of their model - (i) divisive normalization (Litwin-Kumar et al, 2016), (ii) flexible gating/decision making (Yang et al, 2016), and working memory maintenance (Kim & Sejnowski,2021)