Improving the accuracy of single-trial fMRI response estimates using GLMsingle

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    Functional magnetic resonance imaging (fMRI) yields a notoriously noisy and autocorrelated signal, and the GLMsingle method presented here by Prince and colleagues demonstrably improves the estimation of responses evoked by single trials. This open source toolbox is implemented in a user-friendly manner and will be of interest to researchers using human neuroimaging to study neural responses in condition-rich designs, as is increasingly common in cognitive neuroscience experiments.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Advances in artificial intelligence have inspired a paradigm shift in human neuroscience, yielding large-scale functional magnetic resonance imaging (fMRI) datasets that provide high-resolution brain responses to thousands of naturalistic visual stimuli. Because such experiments necessarily involve brief stimulus durations and few repetitions of each stimulus, achieving sufficient signal-to-noise ratio can be a major challenge. We address this challenge by introducing GLMsingle , a scalable, user-friendly toolbox available in MATLAB and Python that enables accurate estimation of single-trial fMRI responses ( glmsingle.org ). Requiring only fMRI time-series data and a design matrix as inputs, GLMsingle integrates three techniques for improving the accuracy of trial-wise general linear model (GLM) beta estimates. First, for each voxel, a custom hemodynamic response function (HRF) is identified from a library of candidate functions. Second, cross-validation is used to derive a set of noise regressors from voxels unrelated to the experiment. Third, to improve the stability of beta estimates for closely spaced trials, betas are regularized on a voxel-wise basis using ridge regression. Applying GLMsingle to the Natural Scenes Dataset and BOLD5000, we find that GLMsingle substantially improves the reliability of beta estimates across visually-responsive cortex in all subjects. Comparable improvements in reliability are also observed in a smaller-scale auditory dataset from the StudyForrest experiment. These improvements translate into tangible benefits for higher-level analyses relevant to systems and cognitive neuroscience. We demonstrate that GLMsingle: (i) helps decorrelate response estimates between trials nearby in time; (ii) enhances representational similarity between subjects within and across datasets; and (iii) boosts one-versus-many decoding of visual stimuli. GLMsingle is a publicly available tool that can significantly improve the quality of past, present, and future neuroimaging datasets sampling brain activity across many experimental conditions.

Article activity feed

  1. eLife assessment

    Functional magnetic resonance imaging (fMRI) yields a notoriously noisy and autocorrelated signal, and the GLMsingle method presented here by Prince and colleagues demonstrably improves the estimation of responses evoked by single trials. This open source toolbox is implemented in a user-friendly manner and will be of interest to researchers using human neuroimaging to study neural responses in condition-rich designs, as is increasingly common in cognitive neuroscience experiments.

  2. Reviewer #1 (Public Review):

    In this paper, Prince and colleagues present a new toolbox, GLMsingle, for estimating single-trial BOLD responses from fMRI data. This is an important problem, since fMRI is the most used human neuroimaging method, but the signal it yields is notoriously noisy and autocorrelated. This is especially problematic when one is interested in the response to events that occur only a few times, rather than being repeated many times. This is an increasingly common scenario with the increase in complexity of cognitive neuroscience studies, which for instance consist of the presentation of many naturalistic images or videos that are only repeated a few times in the scanner, or studies aimed at studying the response to surprising events, which by definition cannot be repeated many times.

    Prince and colleagues convincingly show that their GLMsingle toolbox, which combines three techniques that have been previously used independently but not studied together, strongly improves the reliability of single trial BOLD estimates in two different fMRI datasets. Furthermore, the estimated responses were not only more reliable, they also contained more information about stimulus identity, indicating that the estimates yielded more information about the underlying neural representations. This is an exciting development, and suggests that this toolbox will be useful to many if not most human neuroimaging scientists.

    Some questions remain. For instance, would the reduction in autocorrelation in estimates achieved by ridge regression, while desirable when it comes to removing the confounding autocorrelation due to the slow haemodynamics underlying the BOLD signal, also remove genuine neural autocorrelation caused by for instance stimulus-specific adaptation and serial dependence? Also, the two datasets used here had fixed intertrial intervals, would the benefits be as significant for studies employing jittered intertrial intervals?

    Overall, the GLMsingle method presented here promises to be of great benefit to human neuroimaging researchers interested in studying infrequent or rare events.

  3. Reviewer #2 (Public Review):

    General linear modelling (GLM) forms a cornerstone in analyses of task-based functional magnetic resonance imaging (fMRI) data. Obtaining robust single-trial fMRI beta estimates is a difficult problem given the relatively high levels of noise in the fMRI data. The introduced toolbox significantly improves such estimates using a few key features: 1) estimating a separate hemodynamic response function (HRF) for each voxel, 2) including noise regressors to improve reliability of the betas across repetitions, using a cross-validated approach, 3) using ridge regression on the beta estimates. The authors explain these steps well and compare the results obtained on subsequent metrics when choosing the include, or not, the different features along this procedure. They also compare their new approach to the Least-Squares Separate technique for beta estimation. For their demonstrations, they use two condition-rich datasets (NSD and BOLD5000) to show the improvements that different components of GLMsingle afford.
    The metrics used for comparisons are well chosen and relevant. Especially the test-retest reliability of GLM beta profiles is often a prerequisite for most subsequent analyses. Additionally, they consider temporal autocorrelation between beta estimates of neighbouring trials, and a few potential downstream analyses, looking at representational similarity analysis and condition classification. Thus, they really consider a range of possible applications and provide the reader with useful pointers to inspect what is most relevant for a given application.
    This manuscript and toolbox present a major advancement for the field of neuroimaging and should be of interest to essentially any researcher working with task-based fMRI data.

    The strengths of the manuscript and toolbox are numerous, and presented results are convincing. To further the impact of the toolbox and paper, the authors could provide more guidelines on implementation for various common uses of the toolbox and/or factors to consider when deciding which steps to implement in one's analysis pipeline (FitHRF, DenoiseGLM, RR).

    Additionally, there are a few considerations that could be addressed directly:

    1. The authors use crossvalidation to determine the number of nuisance regressors to add in the model. Thus, any variability in responses to a single condition is considered to be 'noise'. How might this influence a potential use of single-trial estimates to assess brain-behaviour correlations (e.g. differences in behavioural responses to a single condition), or within-session learning conditions? For such uses, would the authors suggest to instead use LSS or a subset of their features in GLMsingle (i.e. not using GLMdenoise)?
    2. In the results, using a fixed HRF leads to drastically lower performance on a variety of subsequent measures compared to fitting an HRF to each voxel, especially as regards to beta map test-retest reliability (Fig. 2-3). Have the authors ensured that the HRF chosen is the most appropriate one for the region of interest? In other words, is the chosen HRF also the one that most voxels are fitted in the flexible option?
  4. Reviewer #3 (Public Review):

    Prince et al set out to develop and demonstrate a toolbox for application to fMRI data collected in condition-rich designs, which are characterized by having a large number of conditions, each with a fairly small number of instances within a single participant. This describes a fairly small minority of all fMRI studies conducted currently, but is nonetheless an active area of research, which the GLMsingle toolbox has the potential to benefit. Because these designs benefit less from the trial-averaging approach of the standard GLM, any step that can increase SNR will have an outsized influence on the quality of the results.

    The description of the logic and basic steps instantiated in the toolbox is clear and easy to understand for any researcher with background in this area. Likewise, the analyses the authors conduct to validate the toolbox are reasonable, and are described fairly clearly. If I were conducting a study using this sort of design, I would certainly try out the off-the-shelf version of the toolbox, although I would also do so provisionally, and run my own checks to ensure that the results were not biased or degraded by the toolbox.

    Overall, there are few weaknesses in the methods or results presented in this article, if it is taken as a description of a specific approach developed and employed elsewhere, rather than as a comprehensive test of possible approaches to solve the problems the authors outline. In other words, the article begs the question of what the effect would be of substituting the vast majority of specific choices made in this toolbox, in terms of the algorithms used, the order in which they are applied, and so forth. Given that this is an initial introduction of the toolbox, and the complexity of the article as it is, it is more than reasonable for the authors to forego this kind of comprehensive comparison, but readers may nonetheless be left wondering about the optimality of this specific implementation. Likewise, given the small number of datasets that have the necessary characteristics, it is not surprising that the validation of the toolbox relies on (a large amount of data from) N=8. The authors' point that the two datasets chosen differ in a number of ways is well taken, but it is nonetheless an open question to what extent the results presented here will generalize to other datasets.

    As for strengths, the sheer number of different metrics the authors used to validate the toolbox, covering intra- and inter-subject measures, and distinct analysis methods including RSA and MVPA, was impressive, as was the care in thinking about important issues such as voxel selection. Overall, the methods and results reflect a high degree of expertise and hard work on the part of the authors.

    My overall assessment is that the utility of the toolbox itself may be limited, given the relative rarity of this type of design, at least at present. However, in pointing out new avenues for scholars working in this general area to pursue-for instance, exploring the impact of voxel-specific HRFs or regularization-this paper may have a larger influence, insofar as some of the techniques employed here may eventually find use in other, more widespread use cases.