Multi-study fMRI outlooks on subcortical BOLD responses in the stop-signal paradigm

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This study aggregates across five fMRI datasets and reports that a network of brain areas previously associated with response inhibition processes, including several in the basal ganglia, are more active on failed stop than successful stop trials. This study is valuable as a well-powered investigation of fMRI measures of stopping, and following revisions provides solid evidence for its conclusions.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

This study investigates the functional network underlying response inhibition in the human brain, particularly the role of the basal ganglia in successful action cancellation. Functional magnetic resonance imaging (fMRI) approaches have frequently used the stop-signal task to examine this network. We merge five such datasets, using a novel aggregatory method allowing the unification of raw fMRI data across sites. This meta-analysis, along with other recent aggregatory fMRI studies, does not find evidence for the innervation of the hyperdirect or indirect cortico-basal-ganglia pathways in successful response inhibition. What we do find, is large subcortical activity profiles for failed stop trials. We discuss possible explanations for the mismatch of findings between the fMRI results presented here and results from other research modalities that have implicated nodes of the basal ganglia in successful inhibition. We also highlight the substantial effect smoothing can have on the conclusions drawn from task-specific general linear models. First and foremost, this study presents a proof of concept for meta-analytical methods that enable the merging of extensive, unprocessed, or unreduced datasets. It demonstrates the significant potential that open-access data sharing can offer to the research community. With an increasing number of datasets being shared publicly, researchers will have the ability to conduct meta-analyses on more than just summary data.

Article activity feed

  1. eLife Assessment

    This study aggregates across five fMRI datasets and reports that a network of brain areas previously associated with response inhibition processes, including several in the basal ganglia, are more active on failed stop than successful stop trials. This study is valuable as a well-powered investigation of fMRI measures of stopping, and following revisions provides solid evidence for its conclusions.

  2. Reviewer #2 (Public review):

    This work aggregates data across 5 openly available stopping studies (3 at 7 tesla and 2 at 3 tesla) to evaluate activity patterns across the common contrasts of Failed Stop (FS) > Go, FS > stop success (SS), and SS > Go. Previous work has implicated a set of regions that tend to be positively active in one or more of these contrasts, including the bilateral inferior frontal gyrus, preSMA, and multiple basal ganglia structures. However, the authors argue that upon closer examination, many previous papers have not found subcortical structures to be more active on SS than FS trials, bringing into question whether they play an essential role in (successful) inhibition. In order to evaluate this with more data and power, the authors aggregate across five datasets and find many areas that are *more* active for FS than SS, including bilateral preSMA, GPE, thalamus, and VTA. They argue that this brings into question the role of these areas in inhibition, based upon the assumption that areas involved in inhibition should be more active on successful stop than failed stop trials, not the opposite as they observed.

    Comments on revisions:

    The authors have been responsive to the feedback of both reviewers and they have significantly improved the manuscript. I now judge the work as valuable and solid. The authors have achieved their aims to characterize subcortical BOLD activation in the stop-signal paradigm.

  3. Author response:

    The following is the authors’ response to the previous reviews.

    Reviewer 1:

    This study is one in a series of excellent papers by the Forstmann group focusing on the ability of fMRI to reliably detect activity in small subcortical nuclei - in this case, specifically those purportedly involved in the hyper- and indirect inhibitory basal ganglia pathways. I have been very fond of this work for a long time, beginning with the demonstration of De Hollander, Forstmann et al. (HBM 2017) of the fact that 3T fMRI imaging (as well as many 7T imaging sequences) do not afford sufficient signal to noise ratio to reliably image these small subcortical nuclei. This work has done a lot to reshape my view of seminal past studies of subcortical activity during inhibitory control, including some that have several thousand citations.

    Comments on revised version:

    This is my second review of this article, now entitled "Multi-study fMRI outlooks on subcortical BOLD responses in the stop-signal paradigm" by Isherwood and colleagues.

    The authors have been very responsive to the initial round of reviews.

    I still think it would be helpful to see a combined investigation of the available 7T data, just to really drive the point home that even with the best parameters and a multi-study sample size, fMRI cannot detect any increases in BOLD activity on successful stop compared to go trials. However, I agree with the authors that these "sub samples still lack the temporal resolution seemingly required for looking at the processes in the SST." As such, I don't have any more feedback.

    We thank the reviewer for their positive feedback, and for their thorough and constructive comments on our initial submission.

    Reviewer 2:

    This work aggregates data across 5 openly available stopping studies (3 at 7 tesla and 2 at 3 tesla) to evaluate activity patterns across the common contrasts of Failed Stop (FS) > Go, FS > stop success (SS), and SS > Go. Previous work has implicated a set of regions that tend to be positively active in one or more of these contrasts, including the bilateral inferior frontal gyrus, preSMA, and multiple basal ganglia structures. However, the authors argue that upon closer examination, many previous papers have not found subcortical structures to be more active on SS than FS trials, bringing into question whether they play an essential role in (successful) inhibition. In order to evaluate this with more data and power, the authors aggregate across five datasets and find many areas that are *more* active for FS than SS, including bilateral preSMA, GPE, thalamus, and VTA. They argue that this brings into question the role of these areas in inhibition, based upon the assumption that areas involved in inhibition should be more active on successful stop than failed stop trials, not the opposite as they observed.

    Since the initial submission, the authors have improved their theoretical synthesis and changed their SSRT calculation method to the more appropriate integration method with replacement for go omissions. They have also done a better job of explaining how these fMRI results situate within the broader response inhibition literature including work using other neuroscience methods.

    They have also included a new Bayes Factor analysis. In the process of evaluating this new analysis, I recognized the following comments that I believe justify additional analyses and discussion:

    First, if I understand the author's pipeline, for the ROI analyses it is not appropriate to run FSL's FILM method on the data that were generated by repeating the same time series across all voxels of an ROI. FSL's FILM uses neighboring voxels in parts of the estimation to stabilize temporal correlation and variance estimates and was intended and evaluated for use on voxelwise data. Instead, I believe it would be more appropriate to average the level 1 contrast estimates over the voxels of each ROI to serve as the dependent variables in the ROI analysis.

    We agree with the reviewer’s assertion that this approach could create estimation problems. However, in this instance, we turned off the spatial smoothing procedure that FSL’s FILM normally uses for estimating the amount of autocorrelation – thus, the autocorrelation was estimated based on each voxel’s timeseries individually. We also confirmed that all voxels within each ROI had identical statistics, which would not be the case if the autocorrelation estimates differed per voxel. We have added the following text to the Methods section under fMRI analysis: ROI-wise:

    Note that the standard implementation of FSL FILM uses a spatial smoothing procedure prior to estimating temporal autocorrelations which is suitable for use only on voxelwise data (Woolrich et al., 2001). We therefore turned this spatial smoothing procedure off and instead estimated autocorrelation using each voxel’s individual timeseries.

    Second, for the group-level ROI analyses there seems to be inconsistencies when comparing the z-statistics (Figure 3) to the Bayes Factors (Figure 4) in that very similar zstatistics have very different Bayes Factors within the same contrast across different brain areas, which seemed surprising (e.g., a z of 6.64 has a BF of .858 while another with a z of 6.76 has a BF of 3.18). The authors do briefly discuss some instances in the frequentist and Bayesian results differ, but they do not ever explain by similar z-stats yield very different bayes factors for a given contrast across different brain areas. I believe a discussion of this would be useful.

    We thank the reviewer for their keen observation, and agree that this is indeed a strange inconsistency. Upon reviewing this issue, we came across an error in our analysis pipeline, which led to inconsistent scaling of the parameter estimates between datasets. We corrected this error, and included new tables (Figures 3, 4, and Supplementary Figure 5) which now show improved correspondence between the frequentist results from FSL and the Bayesian results.

    We have updated the text of the Results section accordingly. In this revision, we have also updated all BFs to be expressed in log10 form, to ensure consistency for the reader. Updates to the manuscript are given below.

    Results: Behavioural Analyses:

    Consistent with the assumptions of the standard horse-race model (Logan & Cowan, 1984), the median failed stop RT is significantly faster within all datasets than the median go RT (Aron_3T: p < .001, BFlog10 = 2.77; Poldrack_3T: p < .001, BFlog10 = 23.49; deHollander_7T: p < .001, B BFlog10 = 8.88; Isherwood_7T: p < .001, BFlog10 = 2.95; Miletic_7T: p = .0019, BFlog10 = 1.35). Mean SSRTs were calculated using the integration method and are all within normal range across the datasets.

    Results: ROI-wise GLMS:

    To further statistically compare the functional results between datasets, we then fit a set of GLMs using the canonical HRF with a temporal derivative to the timeseries extracted from each ROI. Below we show the results of the group-level ROI analyses over all datasets using z-scores (Fig. 3) and log-transformed Bayes Factors (BF; Fig. 4). Note that these values were time-locked to the onset of the go signal. See Supplementary Figure 5 for analyses where the FS and SS trials were time-locked to the onset of the stop signal. To account for multiple comparisons, threshold values were set using the FDR method for the frequentist analyses.

    For the FS > GO contrast, the frequentist analysis found significant positive z-scores in all regions bar left and right M1, and the left GPi. The right M1 showed a significant negative z-score; left M1 and GPi showed no significant effect in this contrast. The BFs showed moderate or greater evidence for the alternative hypothesis in bilateral IFG, preSMA, caudate, STN, Tha, and VTA, and right GPe. Bilateral M1 and left GPi showed moderate evidence for the null. Evidence for other ROIs was anecdotal (see Fig 4).

    For the FS > SS contrast, we found significant positive z-scores in in all regions except the left GPi. The BFs showed moderate or greater evidence for right IFG, right GPi, and bilateral M1, preSMA, Tha, and VTA, and moderate evidence for the null in left GPi. Evidence for other ROIs was anecdotal (see Fig 4).

    For the SS > GO contrast we found a significant positive z-scores in bilateral IFG, right Tha, and right VTA, and significant negative z-scores in bilateral M1, left GPe, right GPi, and bilateral putamen. The BFs showed moderate or greater evidence for the alternative hypothesis in bilateral M1 and right IFG, and moderate or greater evidence for the null in left preSMA, bilateral caudate, bilateral GPe, left GPi, bilateral putamen, and bilateral SN. Evidence for other ROIs was anecdotal (see Fig 4).

    Although the frequentist and Bayesian analyses are mostly in line with one another, there were also some differences, particularly in the contrasts with FS. In the FS > GO contrast, the interpretation of the GPi, GPe, putamen, and SN differ. The frequentist models suggests significantly increased activation for these regions (except left GPi) in FS trials. In the Bayesian model, this evidence was found to be anecdotal in the SN and right GPi, and moderate in the right GPe, while finding anecdotal or moderate evidence for the null hypothesis in the left GPe, left GPi, and putamen. For the FS > SS contrast, the frequentist analysis showed significant activation in all regions except for the left GPi, whereas the Bayesian analysis found this evidence to be only anecdotal, or in favour of the null for a large number of regions (see Fig 4 for details).

    Since the Bayes Factor analysis appears to be based on repeated measures ANOVA and the z-statistics are from Flame1+2, the BayesFactor analysis model does not pair with the frequentist analysis model very cleanly. To facilitate comparison, I would recommend that the same repeated measures ANOVA model should be used in both cases. My reading of the literature is that there is no need to be concerned about any benefits of using Flame being lost, since heteroscedasticity does not impact type I errors and will only potentially impact power.

    We agree with the reviewer that there are differences between the two analyses. The advantage of the z-statistics from FSL’s flame 1+2 is that these are based on a multi-level model in which measurement error in the first level (i.e., subject level) is taken into account in the group-level analysis. This is an advantage especially in the current paper since the datasets differ strongly in the degree of measurement error, both due to the differences in field strength and in the number of trials (and volumes). Although multilevel Bayesian approaches exist, none (except by use of custom code) allow for convolution with the HRF of a design matrix like typical MRI analyses. Thus, we extracted the participant-level parameter estimates (converted to percent signal change), and only estimated the dataset and group level parameters with the BayesFactor package. As such, this approach effectively ignores measurement error. However, despite these differences in the analyses, the general conclusions from the Bayesian and frequentist analyses are very aligned after we corrected for the error described above. The Bayesian results are more conservative, which can be explained by the unfiltered participantlevel measurement error increasing the uncertainty of the group-level parameter estimates. At worst, the BFs represent the lower bounds of the true effect, and are thus safe to interpret.

    We have also included an additional figure (Supplementary Figure 7) that shows the correspondence between the BFs and the z scores.

    Though frequentist statistics suggest that many basal ganglia structures are significantly more active in the FS > SS contrast (see 2nd row of Figure 3), the Bayesian analyses are much more equivocal, with no basal ganglia areas showing Log10BF > 1 (which would be indicative of strong evidence). The authors suggest that "the frequentist and Bayesian analyses are monst in line with one another", but in my view, this frequentist vs. Bayesian analysis for the FS > SS contrast seems to suggest substantially different conclusions. More specifically, the frequentist analyses suggest greater activity in FS than SS in most basal ganglia ROIs (all but 2), but the Bayesian analysis did not find *any* basal ganglia ROIs with strong evidence for the alternative hypothesis (or a difference), and several with more evidence for the null than the alternative hypothesis. This difference between the frequentist and Bayesian analyses seems to warrant discussion, but unless I overlooked it, the Bayesian analyses are not mentioned in the Discussion at all. In my view, the frequentist analyses are treated as the results, and the Bayesian analyses were largely ignored.

    The original manuscript only used frequentist statistics to assess the results, and then added Bayesian analyses later in response to a reviewer comment. We agree that the revised discussion did not consider the Bayesian results in enough detail, and have updated the manuscript throughout to more thoroughly incorporate the Bayesian analyses and improve overall readability.

    In the Methods section, we have updated the fMRI analysis – general linear models (GLMs): ROIwise GLMs section to more thoroughly incorporate the Bayesian analyses as follows:

    We compared the full model (H1) comprising trial type, dataset and subject as predictors to the null model (H0) comprising only the dataset and subject as predictor. Datasets and subjects were modeled as random factors in both cases. Since effect sizes in fMRI analyses are typically small, we set the scaling parameter on the effect size prior for fixed effects to 0.25, instead of the default of 0.5, which assumes medium effect sizes (note that the same qualitative conclusions would be reached with the default prior setting; Rouder et al., 2009). We divided the resultant BFs from the full model by the null model to provide evidence for or against a difference in beta weights for each trial type. To interpret the BFs, we used a modified version of Jeffreys’ scale (Andraszewicz et al., 2014; Jeffreys, 1939). To facilitate interpretation of the BFs, we converted them to the logarithmic scale. The approximate conversion between the interpretation of logarithmic BFs and standard interpretation on the adjusted Jeffreys’ scale can be found in Table 3.

    The Bayesian results are also more incorporated into the Discussion as follows:

    Evidence for the role of the basal ganglia in response inhibition comes from a multitude of studies citing significant activation of either the SN, STN or GPe during successful inhibition trials (Aron, 2007; Aron & Poldrack, 2006; Mallet et al., 2016; Nambu et al., 2002; Zhang & Iwaki, 2019). Here, we re-examined activation patterns in the subcortex across five different datasets, identifying differences in regional activation using both frequentist and Bayesian approaches. Broadly, the frequentist approach found significant differences between most ROIs in FS>GO and FS>SS contrasts, and limited differences in the SS>GO contrast. The Bayesian results were more conservative; while many of the ROIs showed moderate or strong evidence, some with small but significant z scores were considered only anecdotal by the Bayesian analysis. In our discussion, where the findings between analytical approaches differ, we focus mainly on the more conservative Bayesian analysis.

    Here, our multi-study results found limited evidence that the canonical inhibition pathways (the indirect and hyperdirect pathways) are recruited during successful response inhibition in the SST. We expected to find increased activation in the nodes of the indirect pathway (e.g., the preSMA, GPe, STN, SN, GPi, and thalamus) during successful stop compared to go or failed stop trials. We found strong evidence for activation pattern differences in the preSMA, thalamus, and right GPi between the two stop types (failed and successful), and limited evidence, or evidence in favour of the null hypothesis, in the other regions, such as the GPe, STN, and SN. However, we did find recruitment of subcortical nodes (VTA, thalamus, STN, and caudate), as well as preSMA and IFG activation during failed stop trials. We suggest that these results indicate that failing to inhibit one’s action is a larger driver of the utilisation of these nodes than action cancellation itself.

    These results are in contention to many previous fMRI studies of the stop signal task as well as research using other measurement techniques such as local field potential recordings, direct subcortical stimulation, and animal studies, where activation of particularly the STN has consistently been observed (Alegre et al., 2013b; Aron & Poldrack, 2006; Benis et al., 2014; Fischer et al., 2017; Mancini et al., 2019; Wessel et al., 2016).

  4. Author response:

    The following is the authors’ response to the original reviews.

    Reviewer #1:

    This is my first review of the article entitled "The canonical stopping network: Revisiting the role of the subcortex in response inhibition" by Isherwood and colleagues. This study is one in a series of excellent papers by the Forstmann group focusing on the ability of fMRI to reliably detect activity in small subcortical nuclei - in this case, specifically those purportedly involved in the hyper- and indirect inhibitory basal ganglia pathways. I have been very fond of this work for a long time, beginning with the demonstration of De Hollander, Forstmann et al. (HBM 2017) of the fact that 3T fMRI imaging (as well as many 7T imaging sequences) do not afford sufficient signal to noise ratio to reliably image these small subcortical nuclei. This work has done a lot to reshape my view of seminal past studies of subcortical activity during inhibitory control, including some that have several thousand citations.

    In the current study, the authors compiled five datasets that aimed to investigate neural activity associated with stopping an already initiated action, as operationalized in the classic stop-signal paradigm. Three of these datasets are taken from their own 7T investigations, and two are datasets from the Poldrack group, which used 3T fMRI.

    The authors make six chief points:

    (1) There does not seem to be a measurable BOLD response in the purportedly critical subcortical areas in contrasts of successful stopping (SS) vs. going (GO), neither across datasets nor within each individual dataset. This includes the STN but also any other areas of the indirect and hyperdirect pathways.

    (2) The failed-stop (FS) vs. GO contrast is the only contrast showing substantial differences in those nodes.

    (3) The positive findings of STN (and other subcortical) activation during the SS vs. GO contrast could be due to the usage of inappropriate smoothing kernels.

    (4) The study demonstrates the utility of aggregating publicly available fMRI data from similar cognitive tasks.

    (5) From the abstract: "The findings challenge previous functional magnetic resonance (fMRI) of the stop-signal task"

    (6) and further: "suggest the need to ascribe a separate function to these networks."

    I strongly and emphatically agree with points 1-5. However, I vehemently disagree with point 6, which appears to be the main thrust of the current paper, based on the discussion, abstract, and - not least - the title.

    To me, this paper essentially shows that fMRI is ill-suited to study the subcortex in the specific context of the stop-signal task. That is not just because of the issues of subcortical small-volume SNR (the main topic of this and related works by this outstanding group), but also because of its limited temporal resolution (which is unacknowledged, but especially impactful in the context of the stop-signal task). I'll expand on what I mean in the following.

    First, the authors are underrepresenting the non-fMRI evidence in favor of the involvement of the subthalamic nucleus (STN) and the basal ganglia more generally in stopping actions.

    - There are many more intracranial local field potential recording studies that show increased STN LFP (or even single-unit) activity in the SS vs. FS and SS vs. GO contrast than listed, which come from at least seven different labs. Here's a (likely non-exhaustive) list of studies that come to mind:

    Ray et al., NeuroImage 2012
    Alegre et al., Experimental Brain Research 2013
    Benis et al., NeuroImage 2014
    Wessel et al., Movement Disorders 2016
    Benis et al., Cortex 2016
    Fischer et al., eLife 2017
    Ghahremani et al., Brain and Language 2018
    Chen et al., Neuron 2020
    Mosher et al., Neuron 2021
    Diesburg et al., eLife 2021

    - Similarly, there is much more evidence than cited that causally influencing STN via deep-brain stimulation also influences action-stopping. Again, the following list is probably incomplete:

    Van den Wildenberg et al., JoCN 2006
    Ray et al., Neuropsychologia 2009
    Hershey et al., Brain 2010
    Swann et al., JNeuro 2011
    Mirabella et al., Cerebral Cortex 2012
    Obeso et al., Exp. Brain Res. 2013
    Georgiev et al., Exp Br Res 2016
    Lofredi et al., Brain 2021
    van den Wildenberg et al, Behav Brain Res 2021
    Wessel et al., Current Biology 2022

    - Moreover, evidence from non-human animals similarly suggests critical STN involvement in action stopping, e.g.:

    Eagle et al., Cerebral Cortex 2008
    Schmidt et al., Nature Neuroscience 2013
    Fife et al., eLife 2017
    Anderson et al., Brain Res 2020

    Together, studies like these provide either causal evidence for STN involvement via direct electrical stimulation of the nucleus or provide direct recordings of its local field potential activity during stopping. This is not to mention the extensive evidence for the involvement of the STN - and the indirect and hyperdirect pathways in general - in motor inhibition more broadly, perhaps best illustrated by their damage leading to (hemi)ballism.

    Hence, I cannot agree with the idea that the current set of findings "suggest the need to ascribe a separate function to these networks", as suggested in the abstract and further explicated in the discussion of the current paper. For this to be the case, we would need to disregard more than a decade's worth of direct recording studies of the STN in favor of a remote measurement of the BOLD response using (provably) sub ideal imaging parameters. There are myriads of explanations of why fMRI may not be able to reveal a potential ground-truth difference in STN activity between the SS and FS/GO conditions, beginning with the simple proposition that it may not afford sufficient SNR, or that perhaps subcortical BOLD is not tightly related to the type of neurophysiological activity that distinguishes these conditions (in the purported case of the stop-signal task, specifically the beta band). But essentially, this paper shows that a specific lens into subcortical activity is likely broken, but then also suggests dismissing existing evidence from superior lenses in favor of the findings from the 'broken' lens. That doesn't make much sense to me.

    Second, there is actually another substantial reason why fMRI may indeed be unsuitable to study STN activity, specifically in the stop-signal paradigm: its limited time resolution. The sequence of subcortical processes on each specific trial type in the stop-signal task is purportedly as follows: at baseline, the basal ganglia exert inhibition on the motor system. During motor initiation, this inhibition is lifted via direct pathway innervation. This is when the three trial types start diverging. When actions then have to be rapidly cancelled (SS and FS), cortical regions signal to STN via the hyperdirect pathway that inhibition has to be rapidly reinstated (see Chen, Starr et al., Neuron 2020 for direct evidence for such a monosynaptic hyperdirect pathway, the speed of which directly predicts SSRT). Hence, inhibition is reinstated (too late in the case of FS trials, but early enough in SS trials, see recordings from the BG in Schmidt, Berke et al., Nature Neuroscience 2013; and Diesburg, Wessel et al., eLife 2021).

    Hence, according to this prevailing model, all three trial types involve a sequence of STN activation (initial inhibition), STN deactivation (disinhibition during GO), and STN reactivation (reinstantiation of inhibition during the response via the hyperdirect pathway on SS/FS trials, reinstantiation of inhibition via the indirect pathway after the response on GO trials). What distinguishes the trial types during this period is chiefly the relative timing of the inhibitory process (earliest on SS trials, slightly later on FS trials, latest on GO trials). However, these temporal differences play out on a level of hundreds of milliseconds, and in all three cases, processing concludes well under a second overall. To fMRI, given its limited time resolution, these activations are bound to look quite similar.

    Lastly, further building on this logic, it's not surprising that FS trials yield increased activity compared to SS and GO trials. That's because FS trials are errors, which are known to activate the STN (Cavanagh et al., JoCN 2014; Siegert et al. Cortex 2014) and afford additional inhibition of the motor system after their occurrence (Guan et al., JNeuro 2022). Again, fMRI will likely conflate this activity with the abovementioned sequence, resulting in a summation of activity and the highest level of BOLD for FS trials.

    In sum, I believe this study has a lot of merit in demonstrating that fMRI is ill-suited to study the subcortex during the SST, but I cannot agree that it warrants any reappreciation of the subcortex's role in stopping, which are not chiefly based on fMRI evidence.

    We would like to thank reviewer 1 for their insightful and helpful comments. We have responded point-by-point below and will give an overview of how we reframed the paper here.

    We agree that there is good evidence from other sources for the presence of the canonical stopping network (indirect and hyperdirect) during action cancellation, and that this should be reflected more in the paper. However, we do not believe that a lack of evidence for this network during the SST makes fMRI ill-suited for studying this task, or other tasks that have neural processes occurring in quick succession. What we believe the activation patterns of fMRI reflect during this task, is the large of amount of activation caused by failed stops. That is, that the role of the STN in error processing may be more pronounced that its role in action cancellation. Due to the replicability of fMRI results, especially at higher field strengths, we believe the activation profile of failed stop trials reflects a paramount role for the STN in error processing. Therefore, while we agree we do not provide evidence against the role of the STN in action cancellation, we do provide evidence that our outlook on subcortical activation during different trial types of this task should be revisited. We have reframed the article to reflect this, and discuss points such as fMRI reliability, validity and the complex overlapping of cognitive processes in the SST in the discussion. Please see all changes to the article indicated by red text.

    A few other points:

    - As I said before, this team's previous work has done a lot to convince me that 3T fMRI is unsuitable to study the STN. As such, it would have been nice to see a combination of the subsamples of the study that DID use imaging protocols and field strengths suitable to actually study this node. This is especially true since the second 3T sample (and arguably, the Isherwood_7T sample) does not afford a lot of trials per subject, to begin with.

    Unfortunately, this study already comprises of the only 7T open access datasets available for the SST. Therefore, unless we combined only the deHollander_7T and Miletic_7T subsamples there is no additional analysis we can do for this right now. While looking at just the sub samples that were 7T and had >300 trials would be interesting, based on the new framing of the paper we do not believe it adds to the study, as the sub samples still lack the temporal resolution seemingly required for looking at the processes in the SST.

    - What was the GLM analysis time-locked to on SS and FS trials? The stop-signal or the GO-signal?

    SS and FS trials were time-locked to the GO signal as this is standard practice. The main reason for this is that we use contrasts to interpret differences in activation patterns between conditions. By time-locking the FS and SS trials to the stop signal, we are contrasting events at different time points, and therefore different stages of processing, which introduces its own sources of error. We agree with the reviewer, however, that a separate analysis with time-locking on the stop-signal has its own merit, and now include results in the supplementary material where the FS and SS trials are time-locked to the stop signal as well.

    - Why was SSRT calculated using the outdated mean method?

    We originally calculated SSRT using the mean method as this was how it was reported in the oldest of the aggregated studies. We have now re-calculated the SSRTs using the integration method with go omission replacement and thank the reviewer for pointing this out. Please see response to comment 3.

    - The authors chose 3.1 as a z-score to "ensure conservatism", but since they are essentially trying to prove the null hypothesis that there is no increased STN activity on SS trials, I would suggest erring on the side of a more lenient threshold to avoid type-2 error.

    We have used minimum FDR-corrected thresholds for each contrast now, instead of using a blanket conservative threshold of 3.1 over all contrasts. The new thresholds for each contrast are shown in text. Please see below (page 12):

    “The thresholds for each contrast are as follows: 3.01 for FS > GO, 2.26 for FS > SS and 3.1 for SS > GO.”

    - The authors state that "The results presented here add to a growing literature exposing inconsistencies in our understanding of the networks underlying successful response inhibition". It would be helpful if the authors cited these studies and what those inconsistencies are.

    We thank reviewer 1 for their detailed and thorough evaluation of our paper. Overall, we agree that there is substantial direct and indirect evidence for the involvement of the cortico-basal-ganglia pathways in response inhibition. We have taken the vast constructive criticism on board and agree with the reviewer that the paper should be reframed. We would like to thank the reviewer for the thoroughness of their helpful comments aiding the revising of the paper.

    (1) I would suggest reframing the study, abstract, discussion, and title to reflect the fact that the study shows that fMRI is unsuitable to study subcortical activity in the SST, rather than the fact that we need to question the subcortical model of inhibition, given the reasons in my public review.

    We agree with the reviewer that the article should be reframed and not taken as direct evidence against the large sum of literature pointing towards the involvement of the cortico-basal-ganglia pathway in response inhibition. We have significantly rewritten the article in light of this.

    (2) I suggest combining the datasets that provide the best imaging parameters and then analyzing the subcortical ROIs with a more lenient threshold and with regressors time-locked to the stop-signals (if that's not already the case). This would make the claim of a null finding much more impactful. Some sort of power analysis and/or Bayes factor analysis of evidence for the null would also be appreciated.

    Instead of using a blanket conservative threshold of 3.1, we instead used only FDR-corrected thresholds. The threshold level is therefore different for each contrast and noted in the figures. We have also added supplementary figures including the group-level SPMs and ROI analyses when the FS and SS trials were time-locked to the stop signal instead of the GO signal (Supplementary Figs 4 & 5). But as mentioned above, due to the difference in time points when contrasting, we believe that time-locking to the GO signal for all trial types makes more sense for the main analysis.

    We have now also computed BFs on the first level ROI beta estimates for all contrasts using the BayesFactor package as implemented in R. We add the following section to the methods and updated the results section accordingly (page 8):

    “In addition to the frequentist analysis we also opted to compute Bayes Factors (BFs) for each contrast per ROI per hemisphere. To do this, we extracted the beta weights for each individual trial type from our first level model. We then compared the beta weights from each trial type to one another using the ‘BayesFactor’ package as implement in R (Morey & Rouder, 2015). We compared the full model comprising of trial type, dataset and subject as predictors to the null model comprising of only the dataset and subject as predictor. The datasets and subjects were modeled as random factors. We divided the resultant BFs from the full model by the null model to provide evidence for or against a significant difference in beta weights for each trial type. To interpret the BFs, we used a modified version of Jeffreys’ scale (Jeffreys, 1939; Lee & Wagenmakers, 2014).”

    (3) I suggest calculating SSRT using the integration method with the replacement of Go omissions, as per the most recent recommendation (Verbruggen et al., eLife 2019).

    We agree we should have used a more optimal method for SSRT estimation. We have replaced our original estimations with that of the integration method with go omissions replacement, as suggested and adapted the results in table 3.

    We have also replaced text in the methods sections to reflect this (page 5):

    “For each participant, the SSRT was calculated using the mean method, estimated by subtracting the mean SSD from median go RT (Aron & Poldrack, 2006; Logan & Cowan, 1984).”

    Now reads:

    “For each participant, the SSRT was calculated using the integration method with replacement of go omissions (Verbruggen et al., 2019), estimated by integrating the RT distribution and calculating the point at which the integral equals p(respond|signal). The completion time of the stop process aligns with the nth RT, where n equals the number of RTs in the RT distribution of go trials multiplied by the probability of responding to a signal.”

    Reviewer #2:

    This work aggregates data across 5 openly available stopping studies (3 at 7 tesla and 2 at 3 tesla) to evaluate activity patterns across the common contrasts of Failed Stop (FS) > Go, FS > stop success (SS), and SS > Go. Previous work has implicated a set of regions that tend to be positively active in one or more of these contrasts, including the bilateral inferior frontal gyrus, preSMA, and multiple basal ganglia structures. However, the authors argue that upon closer examination, many previous papers have not found subcortical structures to be more active on SS than FS trials, bringing into question whether they play an essential role in (successful) inhibition. In order to evaluate this with more data and power, the authors aggregate across five datasets and find many areas that are *more* active for FS than SS, specifically bilateral preSMA, caudate, GPE, thalamus, and VTA, and unilateral M1, GPi, putamen, SN, and STN. They argue that this brings into question the role of these areas in inhibition, based upon the assumption that areas involved in inhibition should be more active on successful stop than failed stop trials, not the opposite as they observed.

    As an empirical result, I believe that the results are robust, but this work does not attempt a new theoretical synthesis of the neuro-cognitive mechanisms of stopping. Specifically, if these many areas are more active on failed stop than successful stop trials, and (at least some of) these areas are situated in pathways that are traditionally assumed to instantiate response inhibition like the hyperdirect pathway, then what function are these areas/pathways involved in? I believe that this work would make a larger impact if the author endeavored to synthesize these results into some kind of theoretical framework for how stopping is instantiated in the brain, even if that framework may be preliminary.

    I also have one main concern about the analysis. The authors use the mean method for computing SSRT, but this has been shown to be more susceptible to distortion from RT slowing (Verbruggen, Chambers & Logan, 2013 Psych Sci), and goes against the consensus recommendation of using the integration with replacement method (Verbruggen et al., 2019). Therefore, I would strongly recommend replacing all mean SSRT estimates with estimates using the integration with replacement method.

    I found the paper clearly written and empirically strong. As I mentioned in the public review, I believe that the main shortcoming is the lack of theoretical synthesis. I would encourage the authors to attempt to synthesize these results into some form of theoretical explanation. I would also encourage replacing the mean method with the integration with replacement method for computing SSRT. I also have the following specific comments and suggestions (in the approximate order in which they appear in the manuscript) that I hope can improve the manuscript:

    We would like to thank reviewer 2 for their insightful and interesting comments. We have adapted our paper to reflect these comments. Please see direct responses to your comments below. We agree with the reviewer that some type of theoretical synthesis would help with the interpretability of the article. We have substantially reworked the discussion and included theoretical considerations behind the newer narrative. Please see all changes to the article indicated by red text.

    (1) The authors say "performance on successful stop trials is quantified by the stop signal reaction time". I don't think this is technically accurate. SSRT is a measure of the average latency of the stop process for all trials, not just for the trials in which subjects successfully stop.

    Thank you for pointing this technically incorrect statement. We have replaced the above sentence with the following (page 1):

    “Inhibition performance in the SST as a whole is quantified by the stop signal reaction time (SSRT), which estimates the speed of the latent stopping process (Verbruggen et al., 2019).”

    (2) The authors say "few studies have detected differences in the BOLD response between FS and SS trials", but then do not cite any papers that detected differences until several sentences later (de Hollander et al., 2017; Isherwood et al., 2023; Miletic et al., 2020). If these are the only ones, and they only show greater FS than SS, then I think this point could be made more clearly and directly.

    We have moved the citations to the correct place in the text to be clearer. We have also rephrased this part of the introduction to make the points more direct (page 2).

    “In the subcortex, functional evidence is relatively inconsistent. Some studies have found an increase in BOLD response in the STN in SS > GO contrasts (Aron & Poldrack, 2006; Coxon et al., 2016; Gaillard et al., 2020; Yoon et al., 2019), but others have failed to replicate this (Bloemendaal et al., 2016; Boehler et al., 2010; Chang et al., 2020; B. Xu et al., 2015). Moreover, some studies have actually found higher STN, SN and thalamic activation in failed stop trials, not successful ones (de Hollander et al., 2017; Isherwood et al., 2023; Miletić et al., 2020).

    (3) Unless I overlooked it, I don't believe that the author specified the criterion that any given subject is excluded based upon. Given some studies have significant exclusions (e.g., Poldrack_3T), I think being clear about how many subjects violated each criterion would be useful.

    This is indeed interesting and important information to include. We have added the number of participants who were excluded for each criterion. Please see added text below (page 4):

    “Based on these criteria, no subjects were excluded from the Aron_3T dataset. 24 subjects were excluded from the Poldrack_3T dataset (3 based on criterion 1, 9 on criterion 2, 11 on criterion 3, and 8 on criterion 4). Three subjects were excluded from the deHollander_7T dataset (2 based on criterion 1 and 1 on criterion 2). Five subjects were excluded from the Isherwood_7T dataset (2 based on criterion 1, 1 on criterion 2, and 2 on criterion 4). Two subjects were excluded from the Miletic_7T dataset (1 based on criterion 2 and 1 on criterion 4). Note that some participants in the Poldrack_3T study failed to meet multiple inclusion criteria.”

    (4) The Method section included very exhaustive descriptions of the neuroimaging processing pipeline, which was appreciated. However, it seems that much of what is presented is not actually used in any of the analyses. For example, it seems that "functional data preprocessing" section may be fMRIPrep boilerplate, which again is fine, but I think it would help to clarify that much of the preprocessing was not used in any part of the analysis pipeline for any results. For example, at first blush, I thought the authors were using global signal regression, but after a more careful examination, I believe that they are only computing global signals but never using them. Similarly with tCompCor seemingly being computed but not used. If possible, I would recommend that the authors share code that instantiates their behavioral and neuroimaging analysis pipeline so that any confusion about what was actually done could be programmatically verified. At a minimum, I would recommend more clearly distinguishing the pipeline steps that actually went into any presented analyses.

    We thank the reviewer for finding this inconsistency. The methods section indeed uses the fMRIprep boilerplate text, which we included so to be as accurate as possible when describing the preprocessing steps taken. While we believe leaving the exact boilerplate text that fMRIprep gives us is the most accurate method to show our preprocessing, we have adapted some of the text to clarify which computations were not used in the subsequent analysis. As a side-note, for future reference, we’d like to add that the fmriprep authors expressly recommend users to report the boilerplate completely and unaltered, and as such, we believe this may become a recurring issue (page 7).

    “While many regressors were computed in the preprocessing of the fMRI data, not all were used in the subsequent analysis. The exact regressors used for the analysis can be found above. For example, tCompCor and global signals were calculated in our generic preprocessing pipeline but not part of the analysis. The code used for preprocessing and analysis can be found in the data and code availability statement.”

    (5) What does it mean for the Poldrack_3T to have N/A for SSD range? Please clarify.

    Thank you for pointing out this omission. We had not yet found the possible SSD range for this study. We have replaced this value with the correct value (0 – 1000 ms).

    (6) The SSD range of 0-2000ms for deHollander_7T and Miletic_7T seems very high. Was this limit ever reached or even approached? SSD distributions could be a useful addition to the supplement.

    Thank you for also bringing this mistake to light. We had accidentally placed the max trial duration in these fields instead of the max allowable SSD value. We have replaced the correct value (0 – 900 ms).

    (7) The author says "In addition, median go RTs did not correlate with mean SSRTs within datasets (Aron_3T: r = .411, p = .10, BF = 1.41; Poldrack_3T: r = .011, p = .91, BF = .23; deHollander_7T: r = -.30, p = .09, BF = 1.30; Isherwood_7T: r = .13, p = .65, BF = .57; Miletic_7T: r = .37, p = .19, BF = 1.02), indicating independence between the stop and go processes, an important assumption of the horse-race model (Logan & Cowan, 1984)." However, the independent race model assumes context independence (the finishing time of the go process is not affected by the presence of the stop process) and stochastic independence (the duration of the go and stop processes are independent on a given trial). This analysis does not seem to evaluate either of these forms of independence, as it correlates RT and SSRT across subjects, so it was unclear how this analysis evaluated either of the types of independence that are assumed by the independent race model. Please clarify or remove.

    Thank you for this comment. We realize that this analysis indeed does not evaluate either context or stochastic independence and therefore we have removed this from the manuscript.

    (8) The RTs in Isherwood_7T are considerably slower than the other studies, even though the go stimulus+response is the same (very simple) stimulus-response mapping from arrows to button presses. Is there any difference in procedure or stimuli that might explain this difference? It is the only study with a visual stop signal, but to my knowledge, there is no work suggesting visual stop signals encourage more proactive slowing. If possible, I think a brief discussion of the unusually slow RTs in Isherwood_7T would be useful.

    We have included the following text in the manuscript to reflect this observed difference in RT between the Isherwood_7T dataset and the other datasets (page 9).

    “Longer RTs were found in the Isherwood_7T dataset in comparison to the four other datasets. The only difference in procedure in the Isherwood_7T dataset is the use of a visual stop signal as opposed to an auditory stop signal. This RT difference is consistent with previous research, where auditory stop signals and visual go stimuli have been associated with faster RTs compared to unimodal visual presentation (Carrillo-de-la-Peña et al., 2019; Weber et al., 2024). The mean SSRTs and probability of stopping are within normal range, indicating that participants understood the task and responded in the expected manner.”

    (9) When the authors included both 3T and 7T data, I thought they were preparing to evaluate the effect of magnet strength on stop networks, but they didn't do this analysis. Is this because the authors believe there is insufficient power? It seems that this could be an interesting exploratory analysis that could improve the paper.

    We thank the reviewer for this interesting comment. As our dataset sample contains only two 3T and three 7T datasets we indeed believe there is insufficient power to warrant such an analysis. In addition, we wanted the focus of this paper to be how fMRI examines the SST in general, and not differences between acquisition methods. With a greater number of datasets with different imaging parameters (especially TE or resolution) in addition to field strength, we agree such an analysis would be interesting, although beyond the scope of this article.

    (10) The authors evaluate smoothing and it seems that the conclusion that they want to come to is that with a larger smoothing kernel, the results in the stop networks bleed into surrounding areas, producing false positive activity. However, in the absence of a ground truth of the true contributions of these areas, it seems that an alternative interpretation of the results is that the denser maps when using a larger smoothing kernel could be closer to "true" activation, with the maps using a smaller smoothing kernel missing some true activity. It seems worth entertaining these two possible interpretations for the smoothing results unless there is clear reason to conclude that the smoothed results are producing false positive activity.

    We agree with the view of the reviewer on the interpretation of the smoothing results. We indeed cannot rule this out as a possible interpretation of the results, due to a lack of ground truth. We have added text to the article to reflect this view and discuss the types of errors we can expect for both smaller and larger smoothing kernels (page 15).

    “In the absence of a ground truth, we are not able to fully justify the use of either larger or smaller kernels to analyse such data. On the one hand, aberrantly large smoothing kernels could lead to false positives in activation profiles, due to bleeding of observed activation into surrounding tissues. On the other side, too little smoothing could lead to false negatives, missing some true activity in surrounding regions. While we cannot concretely validate either choice, it should be noted that there is lower spatial uncertainty in the subcortex compared to the cortex, due to the lower anatomical variability. False positives from smoothing spatially unmatched signal, are more likely than false negatives. It may be more prudent for studies to use a range of smoothing kernels, to assess the robustness of their fMRI activation profiles.”

  5. eLife assessment

    This study aggregates across five fMRI datasets and reports that a network of brain areas previously associated with response inhibition processes, including several in the basal ganglia, are more active on failed stop than successful stop trials. This study is valuable as a well-powered investigation of fMRI measures of stopping. However, evidence for the authors' conclusions regarding the role of subcortical nodes in stopping is incomplete, due to the limitations in the fMRI analysis.

  6. Reviewer #1 (Public Review):

    This study is one in a series of excellent papers by the Forstmann group focusing on the ability of fMRI to reliably detect activity in small subcortical nuclei - in this case, specifically those purportedly involved in the hyper- and indirect inhibitory basal ganglia pathways. I have been very fond of this work for a long time, beginning with the demonstration of De Hollander, Forstmann et al. (HBM 2017) of the fact that 3T fMRI imaging (as well as many 7T imaging sequences) do not afford sufficient signal to noise ratio to reliably image these small subcortical nuclei. This work has done a lot to reshape my view of seminal past studies of subcortical activity during inhibitory control, including some that have several thousand citations.

    Comments on revised version:

    This is my second review of this article, now entitled "Multi-study fMRI outlooks on subcortical BOLD responses in the stop-signal paradigm" by Isherwood and colleagues.

    The authors have been very responsive to the initial round of reviews.

    I still think it would be helpful to see a combined investigation of the available 7T data, just to really drive the point home that even with the best parameters and a multi-study sample size, fMRI cannot detect any increases in BOLD activity on successful stop compared to go trials. However, I agree with the authors that these "sub samples still lack the temporal resolution seemingly required for looking at the processes in the SST."

    As such, I don't have any more feedback.

  7. Reviewer #2 (Public Review):

    This work aggregates data across 5 openly available stopping studies (3 at 7 tesla and 2 at 3 tesla) to evaluate activity patterns across the common contrasts of Failed Stop (FS) > Go, FS > stop success (SS), and SS > Go. Previous work has implicated a set of regions that tend to be positively active in one or more of these contrasts, including the bilateral inferior frontal gyrus, preSMA, and multiple basal ganglia structures. However, the authors argue that upon closer examination, many previous papers have not found subcortical structures to be more active on SS than FS trials, bringing into question whether they play an essential role in (successful) inhibition. In order to evaluate this with more data and power, the authors aggregate across five datasets and find many areas that are *more* active for FS than SS, including bilateral preSMA, GPE, thalamus, and VTA. They argue that this brings into question the role of these areas in inhibition, based upon the assumption that areas involved in inhibition should be more active on successful stop than failed stop trials, not the opposite as they observed.

    Since the initial submission, the authors have improved their theoretical synthesis and changed their SSRT calculation method to the more appropriate integration method with replacement for go omissions. They have also done a better job of explaining how these fMRI results situate within the broader response inhibition literature including work using other neuroscience methods.

    They have also included a new Bayes Factor analysis. In the process of evaluating this new analysis, I recognized the following comments that I believe justify additional analyses and discussion:

    First, if I understand the author's pipeline, for the ROI analyses it is not appropriate to run FSL's FILM method on the data that were generated by repeating the same time series across all voxels of an ROI. FSL's FILM uses neighboring voxels in parts of the estimation to stabilize temporal correlation and variance estimates and was intended and evaluated for use on voxelwise data. Instead, I believe it would be more appropriate to average the level 1 contrast estimates over the voxels of each ROI to serve as the dependent variables in the ROI analysis.

    Second, for the group-level ROI analyses there seems to be inconsistencies when comparing the z-statistics (Figure 3) to the Bayes Factors (Figure 4) in that very similar z-statistics have very different Bayes Factors within the same contrast across different brain areas, which seemed surprising (e.g., a z of 6.64 has a BF of .858 while another with a z of 6.76 has a BF of 3.18). The authors do briefly discuss some instances in the frequentist and Bayesian results differ, but they do not ever explain by similar z-stats yield very different bayes factors for a given contrast across different brain areas. I believe a discussion of this would be useful.

    Third, since the Bayes Factor analysis appears to be based on repeated measures ANOVA and the z-statistics are from Flame1+2, the BayesFactor analysis model does not pair with the frequentist analysis model very cleanly. To facilitate comparison, I would recommend that the same repeated measures ANOVA model should be used in both cases. My reading of the literature is that there is no need to be concerned about any benefits of using Flame being lost, since heteroscedasticity does not impact type I errors and will only potentially impact power (Mumford & Nichols, 2009 NeuroImage).

    Fourth, though frequentist statistics suggest that many basal ganglia structures are significantly more active in the FS > SS contrast (see 2nd row of Figure 3), the Bayesian analyses are much more equivocal, with no basal ganglia areas showing Log10BF > 1 (which would be indicative of strong evidence). The authors suggest that "the frequentist and Bayesian analyses are monst in line with one another", but in my view, this frequentist vs. Bayesian analysis for the FS > SS contrast seems to suggest substantially different conclusions. More specifically, the frequentist analyses suggest greater activity in FS than SS in most basal ganglia ROIs (all but 2), but the Bayesian analysis did not find *any* basal ganglia ROIs with strong evidence for the alternative hypothesis (or a difference), and several with more evidence for the null than the alternative hypothesis. This difference between the frequentist and Bayesian analyses seems to warrant discussion, but unless I overlooked it, the Bayesian analyses are not mentioned in the Discussion at all. In my view, the frequentist analyses are treated as the results, and the Bayesian analyses were largely ignored.

    Overall, I think this paper makes a useful and mostly solid contribution to the literature. I have made some suggestions for adjustments and clarification of the neuroimaging pipeline and Bayesian analyses that I believe would strengthen the work further.

  8. eLife assessment

    This study aggregates across five fMRI datasets and reports that a network of brain areas previously associated with response inhibition processes, including several in the basal ganglia, are more active on failed stop than successful stop trials. This study is valuable as a well-powered investigation of fMRI measures of stopping. However, evidence for the authors' conclusions regarding the role of subcortical nodes in stopping is incomplete, due to the limitations of fMRI and a lack of theoretical synthesis.

  9. Reviewer #1 (Public Review):

    This is my first review of the article entitled "The canonical stopping network: Revisiting the role of the subcortex in response inhibition" by Isherwood and colleagues. This study is one in a series of excellent papers by the Forstmann group focusing on the ability of fMRI to reliably detect activity in small subcortical nuclei - in this case, specifically those purportedly involved in the hyper- and indirect inhibitory basal ganglia pathways. I have been very fond of this work for a long time, beginning with the demonstration of De Hollander, Forstmann et al. (HBM 2017) of the fact that 3T fMRI imaging (as well as many 7T imaging sequences) do not afford sufficient signal to noise ratio to reliably image these small subcortical nuclei. This work has done a lot to reshape my view of seminal past studies of subcortical activity during inhibitory control, including some that have several thousand citations.

    In the current study, the authors compiled five datasets that aimed to investigate neural activity associated with stopping an already initiated action, as operationalized in the classic stop-signal paradigm. Three of these datasets are taken from their own 7T investigations, and two are datasets from the Poldrack group, which used 3T fMRI.

    The authors make six chief points:
    1. There does not seem to be a measurable BOLD response in the purportedly critical subcortical areas in contrasts of successful stopping (SS) vs. going (GO), neither across datasets nor within each individual dataset. This includes the STN but also any other areas of the indirect and hyperdirect pathways.
    2. The failed-stop (FS) vs. GO contrast is the only contrast showing substantial differences in those nodes.
    3. The positive findings of STN (and other subcortical) activation during the SS vs. GO contrast could be due to the usage of inappropriate smoothing kernels.
    4. The study demonstrates the utility of aggregating publicly available fMRI data from similar cognitive tasks.
    5. From the abstract: "The findings challenge previous functional magnetic resonance (fMRI) of the stop-signal task"
    6. and further: "suggest the need to ascribe a separate function to these networks."

    I strongly and emphatically agree with points 1-5. However, I vehemently disagree with point 6, which appears to be the main thrust of the current paper, based on the discussion, abstract, and - not least - the title.

    To me, this paper essentially shows that fMRI is ill-suited to study the subcortex in the specific context of the stop-signal task. That is not just because of the issues of subcortical small-volume SNR (the main topic of this and related works by this outstanding group), but also because of its limited temporal resolution (which is unacknowledged, but especially impactful in the context of the stop-signal task). I'll expand on what I mean in the following.

    First, the authors are underrepresenting the non-fMRI evidence in favor of the involvement of the subthalamic nucleus (STN) and the basal ganglia more generally in stopping actions.
    - There are many more intracranial local field potential recording studies that show increased STN LFP (or even single-unit) activity in the SS vs. FS and SS vs. GO contrast than listed, which come from at least seven different labs. Here's a (likely non-exhaustive) list of studies that come to mind:
    o Ray et al., NeuroImage 2012
    o Alegre et al., Experimental Brain Research 2013
    o Benis et al., NeuroImage 2014
    o Wessel et al., Movement Disorders 2016
    o Benis et al., Cortex 2016
    o Fischer et al., eLife 2017
    o Ghahremani et al., Brain and Language 2018
    o Chen et al., Neuron 2020
    o Mosher et al., Neuron 2021
    o Diesburg et al., eLife 2021
    - Similarly, there is much more evidence than cited that causally influencing STN via deep-brain stimulation also influences action-stopping. Again, the following list is probably incomplete:
    o Van den Wildenberg et al., JoCN 2006
    o Ray et al., Neuropsychologia 2009
    o Hershey et al., Brain 2010
    o Swann et al., JNeuro 2011
    o Mirabella et al., Cerebral Cortex 2012
    o Obeso et al., Exp. Brain Res. 2013
    o Georgiev et al., Exp Br Res 2016
    o Lofredi et al., Brain 2021
    o van den Wildenberg et al, Behav Brain Res 2021
    o Wessel et al., Current Biology 2022
    - Moreover, evidence from non-human animals similarly suggests critical STN involvement in action stopping, e.g.:
    o Eagle et al., Cerebral Cortex 2008
    o Schmidt et al., Nature Neuroscience 2013
    o Fife et al., eLife 2017
    o Anderson et al., Brain Res 2020

    Together, studies like these provide either causal evidence for STN involvement via direct electrical stimulation of the nucleus or provide direct recordings of its local field potential activity during stopping. This is not to mention the extensive evidence for the involvement of the STN - and the indirect and hyperdirect pathways in general - in motor inhibition more broadly, perhaps best illustrated by their damage leading to (hemi)ballism.

    Hence, I cannot agree with the idea that the current set of findings "suggest the need to ascribe a separate function to these networks", as suggested in the abstract and further explicated in the discussion of the current paper. For this to be the case, we would need to disregard more than a decade's worth of direct recording studies of the STN in favor of a remote measurement of the BOLD response using (provably) sub ideal imaging parameters. There are myriads of explanations of why fMRI may not be able to reveal a potential ground-truth difference in STN activity between the SS and FS/GO conditions, beginning with the simple proposition that it may not afford sufficient SNR, or that perhaps subcortical BOLD is not tightly related to the type of neurophysiological activity that distinguishes these conditions (in the purported case of the stop-signal task, specifically the beta band). But essentially, this paper shows that a specific lens into subcortical activity is likely broken, but then also suggests dismissing existing evidence from superior lenses in favor of the findings from the 'broken' lens. That doesn't make much sense to me.

    Second, there is actually another substantial reason why fMRI may indeed be unsuitable to study STN activity, specifically in the stop-signal paradigm: its limited time resolution. The sequence of subcortical processes on each specific trial type in the stop-signal task is purportedly as follows: at baseline, the basal ganglia exert inhibition on the motor system. During motor initiation, this inhibition is lifted via direct pathway innervation. This is when the three trial types start diverging. When actions then have to be rapidly cancelled (SS and FS), cortical regions signal to STN via the hyperdirect pathway that inhibition has to be rapidly reinstated (see Chen, Starr et al., Neuron 2020 for direct evidence for such a monosynaptic hyperdirect pathway, the speed of which directly predicts SSRT). Hence, inhibition is reinstated (too late in the case of FS trials, but early enough in SS trials, see recordings from the BG in Schmidt, Berke et al., Nature Neuroscience 2013; and Diesburg, Wessel et al., eLife 2021).
    Hence, according to this prevailing model, all three trial types involve a sequence of STN activation (initial inhibition), STN deactivation (disinhibition during GO), and STN reactivation (reinstantiation of inhibition during the response via the hyperdirect pathway on SS/FS trials, reinstantiation of inhibition via the indirect pathway after the response on GO trials). What distinguishes the trial types during this period is chiefly the relative timing of the inhibitory process (earliest on SS trials, slightly later on FS trials, latest on GO trials). However, these temporal differences play out on a level of hundreds of milliseconds, and in all three cases, processing concludes well under a second overall. To fMRI, given its limited time resolution, these activations are bound to look quite similar.

    Lastly, further building on this logic, it's not surprising that FS trials yield increased activity compared to SS and GO trials. That's because FS trials are errors, which are known to activate the STN (Cavanagh et al., JoCN 2014; Siegert et al. Cortex 2014) and afford additional inhibition of the motor system after their occurrence (Guan et al., JNeuro 2022). Again, fMRI will likely conflate this activity with the abovementioned sequence, resulting in a summation of activity and the highest level of BOLD for FS trials.

    In sum, I believe this study has a lot of merit in demonstrating that fMRI is ill-suited to study the subcortex during the SST, but I cannot agree that it warrants any reappreciation of the subcortex's role in stopping, which are not chiefly based on fMRI evidence.

    A few other points:
    - As I said before, this team's previous work has done a lot to convince me that 3T fMRI is unsuitable to study the STN. As such, it would have been nice to see a combination of the subsamples of the study that DID use imaging protocols and field strengths suitable to actually study this node. This is especially true since the second 3T sample (and arguably, the Isherwood_7T sample) does not afford a lot of trials per subject, to begin with.
    - What was the GLM analysis time-locked to on SS and FS trials? The stop-signal or the GO-signal?
    - Why was SSRT calculated using the outdated mean method?
    - The authors chose 3.1 as a z-score to "ensure conservatism", but since they are essentially trying to prove the null hypothesis that there is no increased STN activity on SS trials, I would suggest erring on the side of a more lenient threshold to avoid type-2 error.
    - The authors state that "The results presented here add to a growing literature exposing inconsistencies in our understanding of the networks underlying successful response inhibition". It would be helpful if the authors cited these studies and what those inconsistencies are.

  10. Reviewer #2 (Public Review):

    This work aggregates data across 5 openly available stopping studies (3 at 7 tesla and 2 at 3 tesla) to evaluate activity patterns across the common contrasts of Failed Stop (FS) > Go, FS > stop success (SS), and SS > Go. Previous work has implicated a set of regions that tend to be positively active in one or more of these contrasts, including the bilateral inferior frontal gyrus, preSMA, and multiple basal ganglia structures. However, the authors argue that upon closer examination, many previous papers have not found subcortical structures to be more active on SS than FS trials, bringing into question whether they play an essential role in (successful) inhibition. In order to evaluate this with more data and power, the authors aggregate across five datasets and find many areas that are *more* active for FS than SS, specifically bilateral preSMA, caudate, GPE, thalamus, and VTA, and unilateral M1, GPi, putamen, SN, and STN. They argue that this brings into question the role of these areas in inhibition, based upon the assumption that areas involved in inhibition should be more active on successful stop than failed stop trials, not the opposite as they observed.

    As an empirical result, I believe that the results are robust, but this work does not attempt a new theoretical synthesis of the neuro-cognitive mechanisms of stopping. Specifically, if these many areas are more active on failed stop than successful stop trials, and (at least some of) these areas are situated in pathways that are traditionally assumed to instantiate response inhibition like the hyperdirect pathway, then what function are these areas/pathways involved in? I believe that this work would make a larger impact if the author endeavored to synthesize these results into some kind of theoretical framework for how stopping is instantiated in the brain, even if that framework may be preliminary.

    I also have one main concern about the analysis. The authors use the mean method for computing SSRT, but this has been shown to be more susceptible to distortion from RT slowing (Verbruggen, Chambers & Logan, 2013 Psych Sci), and goes against the consensus recommendation of using the integration with replacement method (Verbruggen et al., 2019). Therefore, I would strongly recommend replacing all mean SSRT estimates with estimates using the integration with replacement method.