Extent, impact, and mitigation of batch effects in tumor biomarker studies using tissue microarrays

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    Tissue microarrays (TMA) have become a mainstay in clinical and basic research, for both discovery and validation of biomarkers. This manuscript provides relevant methodologic considerations for cancer researchers investigating tissue-biomarkers using TMAs. A comprehensive investigation was conducted using a combination of analytic approaches using empirical data and simulated data to support key findings and conclusions. The authors approach the possible sampling variation in a thoughtful way, not only quantifying the issue systematically, but working towards a solution.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

This article has been Reviewed by the following groups

Read the full article

Abstract

Tissue microarrays (TMAs) have been used in thousands of cancer biomarker studies. To what extent batch effects, measurement error in biomarker levels between slides, affects TMA-based studies has not been assessed systematically. We evaluated 20 protein biomarkers on 14 TMAs with prospectively collected tumor tissue from 1448 primary prostate cancers. In half of the biomarkers, more than 10% of biomarker variance was attributable to between-TMA differences (range, 1–48%). We implemented different methods to mitigate batch effects (R package batchtma ), tested in plasmode simulation. Biomarker levels were more similar between mitigation approaches compared to uncorrected values. For some biomarkers, associations with clinical features changed substantially after addressing batch effects. Batch effects and resulting bias are not an error of an individual study but an inherent feature of TMA-based protein biomarker studies. They always need to be considered during study design and addressed analytically in studies using more than one TMA.

Article activity feed

  1. Author Response:

    Reviewer #2 (Public Review):

    Tissue microarrays have become a mainstay in clinical and basic research, for both discovery and validation of biomarkers. The authors approach the possible sampling variation in a thoughtful way, not only quantifying the issue systematically, but working towards a solution.

    Major Comments:

    o The authors split the variation in to two co-existing explanations, either intratumoral heterogeneity or batch effect (likely a degree of both play a role). Batch correction inherently reduces noise (the latter) at the cost of reducing signal (the former). It would be useful to know what approaches have been employed to test for overfitting. The authors claim in the introduction the use of different methods for maintaining "biological" variation, but that analysis seems limited.

    We agree that overfitting is a potential concern for any model. The large number of tumor cores per each batch is less likely to give rise to overfitting if few parameters per batch are estimated. We consider overfitting of the adjustment models a separate problem from overadjustment, which would remove biological variation and which depends on balancing of batches with respect to biological factors. The results from our simulations (Fig. 5, Fig. 5–figure supplement 1) address the latter. “Biological variation” between TMAs was maintained in each simulated data set (Fig. 5–figure supplement 1). All mitigation approaches are more successful in recovering the true association (Fig. 5) compared to not addressing batch effects.

    o Were there considerations for the variability in Gleason scoring between members of the study team?

    We agree that this is an important consideration. Gleason scores in our study are from a centralized, standardized re-review of full tissue sections performed before constructing the TMAs. These use cores from the highest-density tumor regions. See Stark et al. (JCO 2009, referenced) on how variability was removed.

    o The manuscript involves the processing of a number of different cohorts in the field of prostate cancer. It would be important to know how would the performance of the batchma approach would change in tumors with greater heterogeneity.

    We do not have additional empirical data. We would to like to emphasize that there is substantial heterogeneity within the large prostate cancer case series that we analyzed, which was sampled from population-based cohorts. Moreover, in the last paragraph of the section, “Validation batch effect mitigation in plasmode simulation,” we tested the methods implemented in the batchtma package in simulations that involved scenarios with far greater heterogeneity than empirically observed (Figure 5– figure supplement 3; the actual data on biomarkers with high between-TMA ICCs corresponds to the setting “some confounding”).

  2. Evaluation Summary:

    Tissue microarrays (TMA) have become a mainstay in clinical and basic research, for both discovery and validation of biomarkers. This manuscript provides relevant methodologic considerations for cancer researchers investigating tissue-biomarkers using TMAs. A comprehensive investigation was conducted using a combination of analytic approaches using empirical data and simulated data to support key findings and conclusions. The authors approach the possible sampling variation in a thoughtful way, not only quantifying the issue systematically, but working towards a solution.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

  3. Reviewer #1 (Public Review):

    Tissue microarrays (TMAs) are a critical tool for conducting tissue-biomarker research. In this report, the authors investigated whether technical aspects involved in TMA-based investigations contribute to the presence of batch-effects (e.g., differences in the values of biomarkers measured in tumor samples due to non-biological factors) and tested multiple ways to correct for the measurement error resulting from batch-effects.

    Using data generated from 20 prostate cancer biomarker investigations using 14 different TMAs that included tumor tissue from over 1400 men with prostate cancer, the investigators determined that tumor characteristics such as stage, grade, and date of diagnosis do not contribute to the batch-effects observed across the 14 TMAs. Though these findings may not be generalizable for all potential tissue-biomarkers investigated using TMAs, TMAs developed using different protocols for patient selection and tissue acquisition, preservation, and TMA construction as well as those with smaller sample size and for other cancer types.

    The authors then evaluated six different statistical methods to correct the measurement error due to batch-effects. The strengths and limitations of each method investigated are discussed. An overall strength of this study is the availability of empirical data generated from 20 biomarker investigations using the same TMAs to identify which statistical method leads to the most valid (e.g., true) biomarker estimates. Data simulations were used to determine how each method used to correct the biomarker measurement error due to batch-effects influenced the biomarker-cancer outcome relationship. This is another strength of the investigation which provide a robust assessment of different statistical approaches to overcoming the influence of batch-effects using both empirical and simulated data.

    The author's conclusion that bath-effects are not an error of an individual study, but a feature of this type of research utilizing TMAs is supported by the results reported. While the extent of potential bias introduced from batch-effects does vary between studies based on the data reported, the author's recommendations are well supported and will contribute to improving the validity of tissue-biomarker investigations using TMAs.

  4. Reviewer #2 (Public Review):

    Tissue microarrays have become a mainstay in clinical and basic research, for both discovery and validation of biomarkers. The authors approach the possible sampling variation in a thoughtful way, not only quantifying the issue systematically, but working towards a solution.

    Major Comments:
    o The authors split the variation in to two co-existing explanations, either intratumoral heterogeneity or batch effect (likely a degree of both play a role). Batch correction inherently reduces noise (the latter) at the cost of reducing signal (the former). It would be useful to know what approaches have been employed to test for overfitting. The authors claim in the introduction the use of different methods for maintaining "biological" variation, but that analysis seems limited.
    o Were there considerations for the variability in Gleason scoring between members of the study team?
    o The manuscript involves the processing of a number of different cohorts in the field of prostate cancer. It would be important to know how would the performance of the batchma approach would change in tumors with greater heterogeneity.