Determining fragility and robustness to missing data in binary outcome meta-analyses, illustrated with conflicting associations between vitamin D and cancer mortality

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This manuscript makes a valuable contribution to the concept of fragility of meta-analyses via the so-called 'ellipse of insignificance for meta-analyses' (EOIMETA). The strength of evidence is solid, supported primarily by an example of the fragility of meta-analyses in the association between Vitamin D supplementation and cancer mortality, but the approach could be applied in other meta-analytic contexts. The significance of the work could be enhanced with a more thorough assessment of the impact of between-study heterogeneity, additional case studies, and improved contextualization of the proposed approach in relation to other methods.

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Meta-analysis is a vital component in clinical decision making, but previous work found binary event meta-analytic results can be fragile, affected by only a small number of patients in specific trials. Meta-analyses can also miss literature, and a method for estimating how much additional unseen data would flip results would be a useful tool. This works establishes a complementary and generalisable definition of meta-analytic fragility, based on Ellipse of Insignificance (EOI) and Region of Attainable Redaction (ROAR) methods originally developed for dichotomous outcome trials. This method does not require trial-specific alterations to estimate fragility and yields a general method to estimate robustness of a meta-analysis to data redaction or addition of hypothetical trial outcomes. This method is applied to 3 meta-analyses with conflicting findings on the association of vitamin D supplementation and cancer mortality. A full meta-analysis of all trials cited in the 3 meta-analyses yielded no association between vitamin D supplementation and cancer mortality. Using the method outlined here, it was determined that meta-analytic fragility was high in all cases, with recoding of just 5 patients in the full cohort of 133,262 patients was enough to cross the significance threshold. Small amounts of redacted or non-included data also had substantial impact on each meta-analysis, with addition of just 3 hypothetical patients to an ostensibly significant meta-analyses (N = 38,538) enough to yield a null result. This method for analytical fragility is complementary to previous investigations that suggested meta-analyses are frequently fragile. It further shows that merely increasing the sample size is not an assurance against fragility. Caution should be advised when interpreting the results of meta-analyses and conflicting results may stem from inherent fragility and should be carefully employed.

Article activity feed

  1. eLife Assessment

    This manuscript makes a valuable contribution to the concept of fragility of meta-analyses via the so-called 'ellipse of insignificance for meta-analyses' (EOIMETA). The strength of evidence is solid, supported primarily by an example of the fragility of meta-analyses in the association between Vitamin D supplementation and cancer mortality, but the approach could be applied in other meta-analytic contexts. The significance of the work could be enhanced with a more thorough assessment of the impact of between-study heterogeneity, additional case studies, and improved contextualization of the proposed approach in relation to other methods.

  2. Reviewer #1 (Public review):

    Summary:

    This manuscript addresses an important methodological issue - the fragility of meta-analytic findings - by extending fragility concepts beyond trial-level analysis. The proposed EOIMETA framework provides a generalizable and analytically tractable approach that complements existing methods such as the traditional Fragility Index and Atal et al.'s algorithm. The findings are significant in showing that even large meta-analyses can be highly fragile, with results overturned by very small numbers of event recodings or additions. The evidence is clearly presented, supported by applications to vitamin D supplementation trials, and contributes meaningfully to ongoing debates about the robustness of meta-analytic evidence. Overall, the strength of evidence is moderate to strong, though some clarifications would further enhance interpretability.

    Strengths:

    (1) The manuscript tackles a highly relevant methodological question on the robustness of meta-analytic evidence.

    (2) EOIMETA represents an innovative extension of fragility concepts from single trials to meta-analyses.

    (3) The applications are clearly presented and highlight the potential importance of fragility considerations for evidence synthesis.

    Weaknesses:

    (1) The rationale and mathematical details behind the proposed EOI and ROAR methods are insufficiently explained. Readers are asked to rely on external sources (Grimes, 2022; 2024b) without adequate exposition here. At a minimum, the definitions, intuition, and key formulas should be summarized in the manuscript to ensure comprehensibility.

    (2) EOIMETA is described as being applicable when heterogeneity is low, but guidance is missing on how to interpret results when heterogeneity is high (e.g., large I²). Clarification in the Results/Discussion is needed, and ideally, a simulation or illustrative example could be added.

    (3) The manuscript would benefit from side-by-side comparisons between the traditional FI at the trial level and EOIMETA at the meta-analytic level. This would contextualize the proposed approach and underscore the added value of EOIMETA.

    (4) Scope of FI: The statement that FI applies only to binary outcomes is inaccurate. While originally developed for dichotomous endpoints, extensions exist (e.g., Continuous Fragility Index, CFI). The manuscript should clarify that EOIMETA focuses on binary outcomes, but FI, as a concept, has been generalized.

  3. Reviewer #2 (Public review):

    Summary:

    The study expands existing analytical tools originally developed for randomized controlled trials with dichotomous outcomes to assess the potential impact of missing data, adapting them for meta-analytical contexts. These tools evaluate how missing data may influence meta-analyses where p-value distributions cluster around significance thresholds, often leading to conflicting meta-analyses addressing the same research question. The approach quantifies the number of recodings (adding events to the experimental group and/or removing events from the control group) required for a meta-analysis to lose or gain statistical significance. The author developed an R package to perform fragility and redaction analyses and to compare these methods with a previously established approach by Atal et al. (2019), also integrated into the package. Overall, the study provides valuable insights by applying existing analytical tools from randomized controlled trials to meta-analytical contexts.

    Strengths:

    The author's results support his claims. Analyzing the fragility of a given meta-analysis could be a valuable approach for identifying early signs of fragility within a specific topic or body of evidence. If fragility is detected alongside results that hover around the significance threshold, adjusting the significance cutoff as a function of sample size should be considered before making any binary decision regarding statistical significance for that body of evidence. Although the primary goal of meta-analysis is effect estimation, conclusions often still rely on threshold-based interpretations, which is understandable. In some of the examples presented by Atal et al. (2019), the event recoding required to shift a meta-analysis from significant to non-significant (or vice versa) produced only minimal changes in the effect size estimation. Therefore, in bodies of evidence where meta-analyses are fragile or where results cluster near the null, it may be appropriate to adjust the cutoff. Conducting such analyses-identifying fragility early and adapting thresholds accordingly-could help flag fragile bodies of evidence and prevent future conflicting meta-analyses on the same question, thereby reducing research waste and improving reproducibility.

    Weaknesses:

    It would be valuable to include additional bodies of conflicting literature in which meta-analyses have demonstrated fragility. This would allow for a more thorough assessment of the consistency of these analytical tools, their differences, and whether this particular body of literature favored one methodology over another. The method proposed by Atal et al. was applied to numerous meta-analyses and demonstrated consistent performance. I believe there is room for improvement, as both the EOI and ROAR appear to be very promising tools for identifying fragility in meta-analytical contexts.

    I believe the manuscript should be improved in terms of reporting, with clearer statements of the study's and methods' limitations, and by incorporating additional bodies of evidence to strengthen its claims.

  4. Reviewer #3 (Public review):

    Summary and strengths:

    In this manuscript, Grimes presents an extension of the Ellipse of Insignificant (EOI) and Region of Attainable Redaction (ROAR) metrics to the meta-analysis setting as metrics for fragility and robustness evaluation of meta-analysis. The author applies these metrics to three meta-analyses of Vitamin D and cancer mortality, finding substantial fragility in their conclusions. Overall, I think extension/adaptation is a conceptually valuable addition to meta-analysis evaluation, and the manuscript is generally well-written.

    Specific comments:

    (1) The manuscript would benefit from a clearer explanation of in what sense EOIMETA is generalizable. The author mentions this several times, but without a clear explanation of what they mean here.

    (2) The authors mentioned the proposed tools assume low between-study heterogeneity. Could the author illustrate mathematically in the paper how the between-study heterogeneity would influence the proposed measures? Moreover, the between-study heterogeneity is high in Zhang et al's 2022 study. It would be a good place to comment on the influence of such high heterogeneity on the results, and specifying a practical heterogeneity cutoff would better guide future users.

    (3) I think clarifying the concepts of "small effect", "fragile result", and "unreliable result" would be helpful for preventing misinterpretation by future users. I am concerned that the audience may be confusing these concepts. A small effect may be related to a fragile meta-analysis result. A fragile meta-analysis doesn't necessarily mean wrong/untrustworthy results. A fragile but precise estimate can still reflect a true effect, but whether that size of true effect is clinically meaningful is another question. Clarifying the effect magnitude, fragility, and reliability in the discussion would be helpful.