Are our 95% CIs only worth 45% confidence?

Ulrich Knief
Wolfgang Forstmeier

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

When multiple studies on the same research question, or multiple analyses of the same dataset are summarized in a meta-analysis, our confidence intervals (CIs) are put to a relentless test of reliability. While we might hope that 95% of the CIs would contain the meta-analytic mean value (and therefore presumably also the true value), a recent meta-analysis of 512 meta-analyses in ecology and evolution suggests that only a sobering 45% of them do. As this paradox of overconfidence continues to confuse researchers, we attempt to explain where most of the heterogeneity in findings might be coming from. On the one hand, heterogeneity is unsurprising because conventional 95% CIs refer only to the uncertainty due to sampling noise, but not to any other source of error such as arbitrary analysis decisions (“model uncertainty”). Being aware of multiple sources of error beyond sampling noise, the replication crisis logically follows from an anticonservative statistical practice that allows for overinterpretation beyond the legitimate inference space, which in fact is narrower than commonly acknowledged. We explain how to calculate extended confidence intervals (CI _ext ) that also cover other sources of biological and analytical heterogeneity, and we clarify which CI _ext is valid for which extended inference space. We further show how multiple versions of analysis of the same dataset can be merged into a many-analyses meta-analysis (MAMA) which yields a CI that accounts for two sources of error. Yet, on the other hand, we also recognize that analysts which summarize multiple results (e.g. meta-analysis, multiverse analysis, many-analyst studies) will often find high levels of heterogeneity because they “compare apples and oranges” (either in terms of statistical metric or biological interpretation). Therefore, the estimate of only 45% deserved confidence appears far too pessimistic. Overall, we need considerable caution in interpreting both confidence intervals and estimates of heterogeneity, and we need to become better at keeping apples and oranges separated.

Version published to 10.1101/2025.08.05.668631 on bioRxiv
Aug 7, 2025

Decisions under Uncertainty: A Statistical Framework for Evaluating Practical Relevance in Interval-Based Hypothesis Testing

This article has 4 authors:
1. Paul Riesthuis
2. Rob Cribbie
3. Victoria Celio
4. Nataly Beribisky
This article has no evaluationsLatest version Feb 4, 2026
Meta-regression sensitivity to study miscategorisation: Implications and recommendations for double-coding in meta-analysis

This article has 2 authors:
1. Shaheed Azaad
2. Kassandra Friebe
This article has no evaluationsLatest version Dec 20, 2025
Bayes Factor Hypothesis Testing in Meta-Analyses: Practical Advantages and Methodological Considerations

This article has 2 authors:
1. Joris Mulder
2. Robbie Cornelis Maria van Aert
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Decisions under Uncertainty: A Statistical Framework for Evaluating Practical Relevance in Interval-Based Hypothesis Testing

Meta-regression sensitivity to study miscategorisation: Implications and recommendations for double-coding in meta-analysis

Bayes Factor Hypothesis Testing in Meta-Analyses: Practical Advantages and Methodological Considerations