Are our 95% CIs only worth 45% confidence?

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

When multiple studies on the same research question, or multiple analyses of the same dataset are summarized in a meta-analysis, our confidence intervals (CIs) are put to a relentless test of reliability. While we might hope that 95% of the CIs would contain the meta-analytic mean value (and therefore presumably also the true value), a recent meta-analysis of 512 meta-analyses in ecology and evolution suggests that only a sobering 45% of them do. As this paradox of overconfidence continues to confuse researchers, we attempt to explain where most of the heterogeneity in findings might be coming from. On the one hand, heterogeneity is unsurprising because conventional 95% CIs refer only to the uncertainty due to sampling noise, but not to any other source of error such as arbitrary analysis decisions (“model uncertainty”). Being aware of multiple sources of error beyond sampling noise, the replication crisis logically follows from an anticonservative statistical practice that allows for overinterpretation beyond the legitimate inference space, which in fact is narrower than commonly acknowledged. We explain how to calculate extended confidence intervals (CI ext ) that also cover other sources of biological and analytical heterogeneity, and we clarify which CI ext is valid for which extended inference space. We further show how multiple versions of analysis of the same dataset can be merged into a many-analyses meta-analysis (MAMA) which yields a CI that accounts for two sources of error. Yet, on the other hand, we also recognize that analysts which summarize multiple results (e.g. meta-analysis, multiverse analysis, many-analyst studies) will often find high levels of heterogeneity because they “compare apples and oranges” (either in terms of statistical metric or biological interpretation). Therefore, the estimate of only 45% deserved confidence appears far too pessimistic. Overall, we need considerable caution in interpreting both confidence intervals and estimates of heterogeneity, and we need to become better at keeping apples and oranges separated.

Article activity feed