Standardizing workflows in imaging transcriptomics with the abagen toolbox

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This paper will be of interest to scientists studying the large-scale transcriptomic organization of the human brain, and in particular those who have used or plan to use the Allen Human Brain Atlas dataset. The study is well-motivated and novel. The most striking finding is the magnitude of variability that is introduced by different data-processing decisions. The open-source software described in this study is an important contribution to the field and will be of broad utility.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Gene expression fundamentally shapes the structural and functional architecture of the human brain. Open-access transcriptomic datasets like the Allen Human Brain Atlas provide an unprecedented ability to examine these mechanisms in vivo; however, a lack of standardization across research groups has given rise to myriad processing pipelines for using these data. Here, we develop the abagen toolbox, an open-access software package for working with transcriptomic data, and use it to examine how methodological variability influences the outcomes of research using the Allen Human Brain Atlas. Applying three prototypical analyses to the outputs of 750,000 unique processing pipelines, we find that choice of pipeline has a large impact on research findings, with parameters commonly varied in the literature influencing correlations between derived gene expression and other imaging phenotypes by as much as ρ ≥ 1.0. Our results further reveal an ordering of parameter importance, with processing steps that influence gene normalization yielding the greatest impact on downstream statistical inferences and conclusions. The presented work and the development of the abagen toolbox lay the foundation for more standardized and systematic research in imaging transcriptomics, and will help to advance future understanding of the influence of gene expression in the human brain.

Article activity feed

  1. Author Response:

    Reviewer #2 (Public Review):

    In this manuscript, Markello and colleagues exhaustively characterize the impact and relative importance of the many data-processing decisions that go into constructing whole-brain transcriptomic maps from microarray data in the Allen Human Brain Atlas. The authors motivate the need for and have developed an open-source toolbox, abagen, for standardizing workflows in imaging transcriptomics. The authors propose a taxonomy of analyses commonly performed on these data in the literature; they then use abagen to compute the distributions of statistical outcomes for three prototypical analyses across 750,000 combinatorial choices of end-to-end data-processing pipelines. Informed by these findings, the authors then place into context several specific pipelines reported in recent and influential studies.

    The paper is well-written and the authors are successful in illustrating and attempting to address the need for standardized and systematic research in the burgeoning field of imaging transcriptomics. The abagen toolbox is an important contribution and is to my knowledge the current state-of-the-art. The code is clean, flexible, and very well-documented. The chief weakness of this paper is the lack of clear guidance on best practices. Readers should, however, be sympathetic to the fact that there is currently a lack of ground-truth data against which to benchmark different data-processing pipelines.

    Even after reading the paper thoroughly, it's still not completely clear to me whether the analyses in this study are performed for cortex only, or at the whole-brain level (or bi- or uni-laterally for that matter). I'm assuming this study is cortex-only as you say in the methods that "the brain atlas used in the current manuscript represents only cortical parcels." But abagen supports joint cortical+subcortical atlases too. It'd be helpful to readers to make this explicit.

    To ensure comparability across both the volumetric and surface-based versions of the Desikan-Killiany parcellation examined in our analyses, we investigated bilateral cortical samples (i.e., we omitted samples from the cerebellum, subcortex, and brainstem). We have clarified this in the manuscript (“Materials and Methods” section, “Data” subsection, “Parcellations” subsubsection):

    “To facilitate comparison between volumetric- and surface-based parcellations, samples from the cerebellum, subcortex and brainstem were omitted.”

    Along similar lines, do you expect any of the main findings of this study to change when deriving whole-brain maps?

    We anticipate that examining whole-brain gene expression—rather than just cortical expression as in the current manuscript—would likely strengthen the primary findings of our analyses for several reasons. Primarily, there are known differences in gene expression values between cortical and subcortical / brainstem / cerebellar tissue samples in the AHBA (Arnatkevic̆ iūtė et al., 2019). We expect that differentially normalizing these samples across pipelines would therefore result in greater differences between effect estimates for the three examined analyses. In a similar vein, we expect that the rankings of parameter importance would likely remain stable, especially at the extremes. It is possible that some parameters related to normalization (e.g., normalize matched, normalize structures) may move up in rankings; however, overall, the qualitative interpretation of these results is likely to remain unchanged.

    We have revised the Discussion to highlight this consideration (paragraph #4):

    "Although we only considered cortical tissue samples in the current analyses, we expect that including non-cortical samples would further reinforce these results (Arnatkevic̆ iūtė et al., 2019) as known differences in microarray expression values between cortex and subcortical structures will likely emphasize the impact of different normalization procedures across pipelines."

    Arnatkevic̆ iūtė, A., Fulcher, B. D., & Fornito, A. (2019). A practical guide to linking brain-wide gene expression and neuroimaging data. Neuroimage, 189, 353-367.

    Would it make sense to use PET maps or another type of neuroimaging data as a (pseudo-)benchmark in a future study?

    This is a great question and an area of ongoing research, including in our own group. The few studies that have compared PET data with the AHBA have shown that the spatial correlation between gene expression and receptor density is highly variable, with correspondence strongly dependent on the genes and receptors being considered (Beliveau et al., 2017, Martins et al., 2021). This is likely due to the fact that gene expression (as measured by mRNA) is not equivalent to protein synthesis and that PET tracers vary in their specificity and sensitivity for specific receptors. Our group is currently collating a large sample of PET datasets from multiple tracers to demonstrate this lack of correspondence (work forthcoming; presented by Hansen et al., 2021). Given this, we would be hesitant to suggest such a comparison as a benchmark.

    Comparisons of microarray expression data with the RNAseq data also included in the AHBA (as performed in Arnatkeviciute et al., 2019) are also feasible; however, given that some of the pipelines in the current manuscript utilize the RNAseq data to determine probe selection we felt that using this as a benchmark would be biased. Alternatively, a different dataset (e.g., PsychENCODE) could be used; unfortunately, in these datasets the precise spatial location of collected samples are uncertain, and for that reason we would also hesitate to use them as a reference.

    Martins, D., Giacomel, A., Williams, S. C., Turkheimer, F. E., Dipasquale, O., Veronese, M., & PET templates working group. (2021). Imaging transcriptomics: Convergent cellular, transcriptomic, and molecular neuroimaging signatures in the healthy adult human brain. bioRxiv. Beliveau, V., Ganz, M., Feng, L., Ozenne, B., Højgaard, L., Fisher, P. M., ... & Knudsen, G. M. (2017). A high-resolution in vivo atlas of the human brain's serotonin system. Journal of Neuroscience, 37(1), 120-128. Hansen, J. Y., Markello, R. D., Palomero-Gallagher, N., Dagher, A., Misic, B. (2021). Correspondence between gene expression and neurotransmitter receptor and transporter density in the human cortex. In 13th International Symposium of Functional Neuroreceptor Mapping of the Living Brain.

    What about a cross-validation strategy where data are selectively withheld during processing and then predicted after the fact? This may only be possible for a subset of genes and/or pipelines, but it could nonetheless be informative.

    A cross-validation strategy is feasible; however, it will depend on what exactly you are trying to assess. What features are being omitted (i.e., samples or genes) will be strongly influenced by the research question and null hypothesis being tested. For example, when examining the distance-dependent relationship of correlated gene expression, you could leave some tissue samples out and "predict" the fit of these samples (e.g., as in Hansen et al., 2021). As the reviewer suggests, a cross-validation strategy will thus only be possible for some specific research questions, but not generally for entire pipelines.

    One alternative that would be applicable in many cases would be to examine the robustness of the observed effects via a leave-one-donor-out strategy, whereby analyses are repeated six times, omitting one donor each time, to ensure that none of the donors are unduly influencing analytic estimates (Vogel et al., 2020; Arnatkevic̆ iūtė et al., 2019). This may require careful interpretation, however, as different donors contribute variable numbers of samples, and so gene expression estimates will have variable spatial coverage across folds.

    We have added the following text to the Discussion to expand on these points (paragraph #9):

    "One potential solution to this could be to examine the robustness of pipelines based on a leave-one-donor-out strategy (e.g., Vogel et al., 2020; Arnatkevic̆ iūtė et al., 2019), wherein analyses are repeated six times, omitting one donor each time, to ensure that none of the donors are unduly influencing analytic estimates. This approach is likely to become more useful as data from more individuals becomes available, but at present may be a worthwhile approach for assessing whether chosen processing parameters are appropriate."

    In the discussion, you claim that "the optimal set of processing parameters will very likely vary based on research question." I'd like to see this elaborated on a bit further, at least for the most important parameters. For example, when would it make more sense to use one form of gene normalization over the other? What are the implicit assumptions underlying each choice?

    This is an important aspect of processing for the AHBA data: not only do we believe that the optimal set of processing parameters will vary based on research questions, but which processing parameters are most important may also be influenced.

    Gene normalization is a great example. Some genes have very low expression values whereas others have very high expression, and this variability can influence downstream analysis. For example, consider the distance dependent correlated gene expression (CGE) analysis shown in the manuscript: CGE values derived from non-normalized gene expression values will be high because the correlation will be driven by these differences in expression levels across genes rather than common patterns of expression. Normalizing expression values will therefore result in CGE values being more broadly distributed and better capturing shared spatial expression patterns.

    More generally, gene-expression values in the AHBA are imperfect; it is an open problem in transcriptomics to obtain measures of expression that are comparable across genes. Throughout the literature, research has shown that the binding strength of in situ hybridization depends on properties of the RNA sequence used in the binding process, making it difficult to compare "raw" values across different genes. As such, gene normalization allows for a more fair comparison of expression patterns across probes.

    However, even if we were able to obtain perfect measurements that were comparable across genes, there are contexts where researchers may want to retain the variance contributed by genes to accurately reflect their relative expression levels. For example, since many genes measured in the AHBA are not brain-specific, normalization will amplify their noisy expression patterns, potentially obscuring more relevant expression information. This can be avoided by sub-selecting genes in a hypothesis-driven manner, but, as before, this will depend on the research question.

    Within the forms of gene normalization examined (i.e., z-scoring, scaled robust sigmoid normalization), we believe that scaled robust sigmoid is the optimal choice as it is less sensitive to outliers, which are known to exist in the imaging microarray-based transcriptomics data (Fulcher et al., 2019; Arnatkevic̆ iūtė et al., 2019).

    We have added text to the Discussion to expand on these points (paragraph #9):

    “For instance, in most applications gene normalization is appropriate, as it ensures that downstream analyses are not driven by a small subset of highly expressed genes. However, in other applications it may be desirable to retain the variance contributed by genes to accurately reflect their relative expression levels. For example, many genes in AHBA are not brain-specific, so normalization will amplify their expression patterns, potentially obscuring more relevant expression information. This can be avoided by sub-selecting genes in a hypothesis-driven manner and skipping the normalization step altogether.”

    Is there anything to be said about the order of operations? There seem to be several steps in Table 1 which could conceivably be interchanged. If nothing else, this procedural ambiguity is yet another good reason to standardize workflows.

    We believe that the importance of processing order is strongly dependent on which processing steps are being considered. For example, intensity-based filtering of probes must always be performed before probe selection—reversing the order of these operations would, in the majority of cases, be problematic because it would potentially result in the selection of noisy probes to be carried through to analysis. However, the order of other steps (i.e., sample versus gene normalization) could arguably be reversed with no ostensible detriment. We agree with the reviewer that this ambiguity is a good reason to standardize these workflows, and believe that the order of operations implemented in abagen and described in the manuscript is a principled solution to this problem.

    We have added text to the Discussion to clarify this point (paragraph #5):

    “Note that there are some processing steps that should be performed in a specific sequence, and others whose order could potentially be interchanged. For example, intensity-based filtering of probes must always be performed before probe selection—reversing the order of these operations would, in the majority of cases, be problematic because it would potentially result in the selection of noisy probes to be carried through to analysis. However, the order of other steps (e.g., sample versus gene normalization) could arguably be reversed with no ostensible detriment. This procedural ambiguity is a salient example of the need to standardize workflows.”

    I particularly liked the analysis in Figure 2A and thought it made a nice contribution to the paper.

    We appreciate the reviewer's kind words, especially given their extensive foundational work in this field.

    Reviewer #3 (Public Review):

    The work Standardizing workflows in imaging transcriptomics with the Abagen toolbox is a major meta analysis pipeline workflow for comparing and integrating parameter choices in imaging transcriptomics using the Allen Human Brain Atlas (AHBA). The release of the AHBA has strongly increased the interest in determining transcriptomic associations in brain imaging studies, yet there is much variability in the analysis, methods used, and subsequent interpretation.

    This work is illustrative of an important trend in informatics analysis allowing strong metadata control by users so as to access, and implement optimal choices of parameters and to study there distribution. The work implemented as an open source Python toolkit is likely to be of importance to analysts working in these areas.

    It would be helpful to clarify and specifically define the term pipeline as a specific set of parameter, normalization, and other choices that are selected. Whereas this term is in common use in the field, in the present work the meaning is specific to a set of selectable options. Of course the any number of such variable selections could be implemented in the Abagen toolbox, it will help for clarity to more clearly define this term up front.

    We have added text to the Results clarifying what we mean when we refer to "pipeline" (“Results” section):

    “We refer to each unique set of processing choices and parameters as a “pipeline”.

    Similarly, we have added text to the Methods to clarify this as well:

    “Each unique set of these 17 processing choices and parameters constitutes a pipeline, yielding 746,946 unique pipelines."

    My major consideration in this work concerns are two issues. The first is how to characterize and summarize the results of pipeline output produced by Abagen. The manuscript illustrates the workflows and various means of summarizing results but does not offer guidance into preferred interpretation of relative value of the results. Whereas we may argue that the primary purpose of Abagen is to run the various pipelines, allowing downstream interpretation to the user, it would be helpful to understand how the Abagen toolbox organizes, summarizes, and sets this output options up for interpretation. This appears to be only weakly addressed in the present manuscript.

    The primary output of abagen is a single brain region x gene expression matrix based on a researcher-specified atlas. We believe this is the simplest and most fundamental output object of the AHBA that can facilitate a range of analyses, including those we examined in the paper (i.e., correlated gene expression, gene co-expression, and regional gene expression or gene-ofinterest analyses).

    In the manuscript, we examined the outputs of various pipelines only to highlight the potential variability of results as a function of parameter selection; however, in most use cases, we would recommend that researchers only use abagen to run a single pipeline, yielding one brain region x gene expression matrix that they can carry forward to their desired analyses. Selecting different parameters when using abagen will modify the shape or values of this matrix, but not the structure.

    To clarify this we have added the following text to the Results (section "Standardized processing and reporting with the abagen toolbox"):

    "The main output of abagen is a single brain region (or tissue sample) x gene expression matrix. Changing the parameters may modify the shape of the matrix (e.g., different atlases will yield different numbers of regions or samples) or different values (e.g., different processing choices may yield different numbers of genes), but not the structure."

    The second point I of importance I believe is more description of the available functionality in the toolbox, perhaps as more of a specific use case analysis. The authors provide substantial documentation on installing and working with Abagen, and but some more direct indication of how the toolkit would be used would be valuable.

    We agree that it is important to clearly lay out the functionality of the toolbox in the manuscript. We have modified the following paragraph to the Results (Standardized processing and reporting with the abagen toolbox) to elaborate on the tools made available to researchers in abagen:

    “The abagen toolbox supports two use-case driven workflows: (1) a workflow that accepts an atlas and returns a parcellated, preprocessed regional gene expression matrix (Fig. 4a); and, (2) a workflow that accepts a mask and returns preprocessed expression data for all tissue samples within the mask (Fig. 4b). Workflows can be called via a single line of code from either the command line or Python terminal, and take approximately one minute to run with default settings using the Desikan-Killiany atlas. The main output of abagen is a single brain region (or tissue sample) x gene expression matrix. Changing the parameters may modify the shape of the matrix (e.g., different atlases will yield different numbers of regions or samples) or different values (e.g., different processing choices may yield different numbers of genes), but not the structure. The outputs of these workflows can be used generally to examine the three prototypical research questions enabled by the AHBA: correlated gene expression, gene co-expression, and regional expression of genes of interest more broadly (Fornito et al., 2019, Trends Cogn Sci). Beyond its primary workflows, abagen has additional functionality for post-processing the AHBA data (e.g., removing distance-dependent effects from expression data, calculating differential stability estimates; Hawrylycz et al., 2015, Nat Neuro), and for accessing data from the companion Allen Mouse Brain Atlas (e.g., providing interfaces for querying the Allen Mouse API; https://mouse.brain-map.org/; Lein et al., 2007, Nature).”

    As we envision the abagen software to continue to develop in the coming years, we have purposefully omitted the inclusion of code examples in the current manuscript as the API is liable to change over time. To ensure that these examples stay up-to-date with the abagen API, we only include code in the online abagen documentation (https://abagen.readthedocs.io; citable via Zenodo; https://zenodo.org/record/3726257), which can be continuously updated along with the software package.

  2. Evaluation Summary:

    This paper will be of interest to scientists studying the large-scale transcriptomic organization of the human brain, and in particular those who have used or plan to use the Allen Human Brain Atlas dataset. The study is well-motivated and novel. The most striking finding is the magnitude of variability that is introduced by different data-processing decisions. The open-source software described in this study is an important contribution to the field and will be of broad utility.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    The authors present a comprehensive, well documented, and easy to use toolbox for processing and analysing gene expression data for comparisons with neuroimaging data.

    The tool is well designed and well documented. In the paper, it is used to show how different choices of processing can affect the outcome of 3 different types of gene expression analyses. The fact that they can do such analyses, as well as replicate published analyses and examine the outcome of their processing choices nicely illustrates how flexible this toolbox is.

  4. Reviewer #2 (Public Review):

    In this manuscript, Markello and colleagues exhaustively characterize the impact and relative importance of the many data-processing decisions that go into constructing whole-brain transcriptomic maps from microarray data in the Allen Human Brain Atlas. The authors motivate the need for and have developed an open-source toolbox, abagen, for standardizing workflows in imaging transcriptomics. The authors propose a taxonomy of analyses commonly performed on these data in the literature; they then use abagen to compute the distributions of statistical outcomes for three prototypical analyses across 750,000 combinatorial choices of end-to-end data-processing pipelines. Informed by these findings, the authors then place into context several specific pipelines reported in recent and influential studies.

    The paper is well-written and the authors are successful in illustrating and attempting to address the need for standardized and systematic research in the burgeoning field of imaging transcriptomics. The abagen toolbox is an important contribution and is to my knowledge the current state-of-the-art. The code is clean, flexible, and very well-documented. The chief weakness of this paper is the lack of clear guidance on best practices. Readers should, however, be sympathetic to the fact that there is currently a lack of ground-truth data against which to benchmark different data-processing pipelines.

    Even after reading the paper thoroughly, it's still not completely clear to me whether the analyses in this study are performed for cortex only, or at the whole-brain level (or bi- or uni-laterally for that matter). I'm assuming this study is cortex-only as you say in the methods that "the brain atlas used in the current manuscript represents only cortical parcels." But abagen supports joint cortical+subcortical atlases too. It'd be helpful to readers to make this explicit. Along similar lines, do you expect any of the main findings of this study to change when deriving whole-brain maps?

    Would it make sense to use PET maps or another type of neuroimaging data as a (pseudo-)benchmark in a future study? What about a cross-validation strategy where data are selectively withheld during processing and then predicted after the fact? This may only be possible for a subset of genes and/or pipelines, but it could nonetheless be informative.

    In the discussion, you claim that "the optimal set of processing parameters will very likely vary based on research question." I'd like to see this elaborated on a bit further, at least for the most important parameters. For example, when would it make more sense to use one form of gene normalization over the other? What are the implicit assumptions underlying each choice?

    Is there anything to be said about the order of operations? There seem to be several steps in Table 1 which could conceivably be interchanged. If nothing else, this procedural ambiguity is yet another good reason to standardize workflows.

    I particularly liked the analysis in Figure 2A and thought it made a nice contribution to the paper.

  5. Reviewer #3 (Public Review):

    The work “Standardizing workflows in imaging transcriptomics with the Abagen toolbox” is a major meta analysis pipeline workflow for comparing and integrating parameter choices in imaging transcriptomics using the Allen Human Brain Atlas (AHBA). The release of the AHBA has strongly increased the interest in determining transcriptomic associations in brain imaging studies, yet there is much variability in the analysis, methods used, and subsequent interpretation.

    This work is illustrative of an important trend in informatics analysis allowing strong metadata control by users so as to access, and implement optimal choices of parameters and to study there distribution. The work implemented as an open source Python toolkit is likely to be of importance to analysts working in these areas.

    It would be helpful to clarify and specifically define the term pipeline as a specific set of parameter, normalization, and other choices that are selected. Whereas this term is in common use in the field, in the present work the meaning is specific to a set of selectable options. Of course the any number of such variable selections could be implemented in the Abagen toolbox, it will help for clarity to more clearly define this term up front.

    My major consideration in this work concerns are two issues. The first is how to characterize and summarize the results of pipeline output produced by Abagen. The manuscript illustrates the workflows and various means of summarizing results but does not offer guidance into preferred interpretation of relative value of the results. Whereas we may argue that the primary purpose of Abagen is to run the various pipelines, allowing downstream interpretation to the user, it would be helpful to understand how the Abagen toolbox organizes, summarizes, and sets this output options up for interpretation. This appears to be only weakly addressed in the present manuscript.

    The second point I of importance I believe is more description of the available functionality in the toolbox, perhaps as more of a specific use case analysis. The authors provide substantial documentation on installing and working with Abagen, and but some more direct indication of how the toolkit would be used would be valuable.

    The scale of this work is impressive and the work may be widely used by the neuroimaging community.

    I am enthusiastic about this work but would like to see more description of how the Abagen toolbox might be used better converge on more strongly interpretable results. I certainly understand the issue of ground truth remains open but it would seem that the toolkit might be able to summarize and/or statistically priorize pipeline results for users so as to afford better interpretation.

    A second point concerns somewhat more description of what is actually available functionally in the toolkit, at least as a summary.