Opposing BOLD signals and oxygen metabolism largely arise from statistical uncertainty in metabolic estimates

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This manuscript provides a timely and important statistical re-evaluation of a paper by Epp et al., on the discordance of BOLD and CMRO2 measures. The authors present a convincing case based on rigorous re-analysis of the data that these previous results arise predominantly from uncertainty in measurement, rather than physiological features. These findings have implications that are of importance to all studies of brain function using BOLD FMRI.

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Recent work by Epp et al. (2025) reported widespread voxel-wise sign discordance between task-evoked blood-oxygenation-level-dependent (BOLD) responses and estimated changes in cerebral metabolic rate of oxygen (ΔCMRO 2 ), raising important questions about the interpretability of BOLD functional magnetic resonance imaging. Reanalysing the dataset, we found that ΔCMRO 2 estimates showed substantial voxel-wise variability across participants, consistent with the noise sensitivity of model-based metabolic estimates. When this variability was taken into account, 77.2% of voxels could not be robustly classified, as ΔCMRO 2 effects lacked sufficient statistical support to determine concordance or discordance. Where classification was possible, positive BOLD responses were predominantly concordant with metabolism, whereas discordance was considerably higher for negative BOLD responses. These findings suggest that the observed BOLD–metabolism discordance reported previously largely reflects statistical uncertainty in CMRO 2 estimates rather than widespread physiological sign reversal.

Article activity feed

  1. eLife Assessment

    This manuscript provides a timely and important statistical re-evaluation of a paper by Epp et al., on the discordance of BOLD and CMRO2 measures. The authors present a convincing case based on rigorous re-analysis of the data that these previous results arise predominantly from uncertainty in measurement, rather than physiological features. These findings have implications that are of importance to all studies of brain function using BOLD FMRI.

  2. Reviewer #1 (Public review):

    The study by Epp et al. has indeed gotten a lot of attention. As so often in the fMRI literature, some voices had taken the results out of proportion as if this result would suggest that we cannot trust fMRI. This is so, while informed researchers are aware of the capabilities and challenges of BOLD as a measure of neural activity. The paper was discussed and criticized on many aspects from various angles. E.g. with respect to unestablished models of estimating CMRO2, the 40% figure is being overestimated by the mask definition, and expected neuronal and vascular effects underlying the discordance.

    The first publications of these discussions are being shared now. E.g. Chen et al. https://doi.org/10.1038/s41593-026-02288-y. The manuscript at hand augments this discussion. Specifically, the manuscript provides a direct statistical refutation of the recently proposed widespread physiological sign reversal between BOLD and CMRO2.

    By reanalyzing a high-profile dataset, the authors demonstrate that the previously reported 40% discordance rate is an artifact of statistical uncertainty rather than a genuine physiological phenomenon. This critical re-evaluation restores some confidence in the canonical interpretation of BOLD signals that was recently challenged. It highlights the necessity of rigorous statistical validation in quantitative fMRI.

    The following points should be addressed:

    (1) Absence of evidence is taken as evidence of absence

    The group-level significance analysis, summarized in the horizontal bar chart and cortical surface maps, labels non-significant voxels as 'CMRO2 not reliable', and the discussion concludes that positive BOLD responses are predominantly concordant with metabolism.

    The paper treats voxels with non-significant CMRO2 effects as 'statistically uncertain' rather than as potentially reflecting genuine null metabolic changes, conflating absence of evidence with evidence of absence. Because the 77.2% of voxels shown as light orange could reflect either real null metabolism or insufficient power, the paper cannot distinguish between these. This ambiguity matters because a genuine null metabolic response to positive BOLD would itself be physiologically interesting and would not straightforwardly support 'predominant concordance'.

    (2) Contextualization in other current literature

    I feel that the introduction of the paper could also consider the embedding of the current literature about biophysical processes in the negative areas.

    The negative responses have partly been discussed in the literature on quantitative physiology: e.g., Bohraus et al have been able to pinpoint the source of negative CMRO2 in positively activated voxels to large veins (https://doi.org/10.1016/j.celrep.2023.113341). Huber et al. have found that the neurovascular coupling (arterial venous weighting) is different in positively and negatively activated brain areas, making the interpretation of derived parameters on physiology hard.

    (3) Stylistic comments.

    In places, the tone of the language could be revised to ensure that it is perceived as making a constructive contribution to the discussion.

  3. Reviewer #2 (Public review):

    Summary:

    The rebuttal aims to provide a statistical re-evaluation of Epp et al. to investigate the effects of CMRO2 uncertainty on concordance/discordance analysis between BOLD signal responses and CMRO2 change estimates based on an R2 framework. The authors observe markedly higher variance in CMRO2 compared to BOLD, which raises concerns about sign classification purely based on group means/medians.

    Strengths:

    The study is well motivated, and the analytical pipeline is rigorous and has been provided. Overall, the manuscript provides several thoughtful and rigorous analyses that contribute meaningfully to the ongoing discussion surrounding neurovascular coupling and CMRO₂ estimation.

    Weaknesses:

    Some aspects of the analytical framework could be improved, as well as the discussion of the caveats of the methods of this and the original paper.

    (1) The binomial framework discussed on line 110 and described on line 321 reduces continuous ΔBOLD and ΔCMRO2 measurements to binary concordant/discordant labels, which may overemphasize unstable sign flips near zero effect sizes while discarding potentially meaningful magnitude information. The authors acknowledge that this overly strict approach yields very few meaningful voxels. A better justification or explanation of what we are meant to take away from this, other than the variability in the measurement, which is also explored elsewhere, would be helpful to the reader.

    (2) In the methods, in the section entitled: Voxel Selection: BOLD Activation Mask, the authors describe their more traditional univariate statistical method as compared to the PLS approach used in the Epp paper. While I appreciate why the authors chose this approach, which simplifies interpretation, is it possible that this led to a lower number of discordant voxels? If yes, then I would suggest this be also added in the discussion of how the original Epp paper's methodological choices led to the very large percentage of discordant voxels.

    (3) In the original paper, it looks to me like the discordant voxels have low CBF change and low rOEF. The gadolinium-based CBV measurement used to calculate OEF is a measure of total blood volume, while the blood volume that contributes to BOLD resides predominantly in veins and capillaries. Given the long PLD of the ASL acquisition and the total blood volume measurement, it seems to me that it is possible that discordant voxels may have high arterial blood volume, leading to overly large CBV measurement and an underestimation of CBF at this PLD (especially given their young age, for which I would expect ATT to be closer to 1-1.5s based on recent literature). While this is not currently discussed in this paper, it might be relevant to discuss how acquisition choices could bias some voxels towards erroneous CMRO2 estimates, which in turn would lead to these voxels being identified as discordant.

    (4) In the methods, on line 267, the authors describe how they calculated ΔCMRO2 and how it differs from the original paper. A short discussion of how this choice is likely to affect the variance estimates would be warranted, given that the original paper seems to have chosen their method for the explicit purpose of decreasing error propagation. Especially, I wonder if this difference could account for the observation that "77.2% of voxels showed no statistically significant group-level ΔCMRO₂ effect".