Evaluation of replication success: A statistical perspective and extended analysis of the SCORE data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Replications are more and more conducted in multiple research fields. A recent large-scale replication project, Systematizing Confidence in Open Research and Evidence (SCORE) project, replicated 274 effects from research on business, economics, education, political science, psychology, and sociology. An open question remains how replication success can best be assessed. Thirteen methods for assessing replication success were applied in the SCORE project. The first goal of this paper is studying the statistical properties of replication success methods by a numerical study tailored to the characteristics of the SCORE data. The results of the numerical study show that replication success methods are unable to reliably draw a conclusion about replication success for typical sample sizes of the original study and replication. Hence, there is often insufficient information in one original study and one replication for drawing definite conclusions. The second goal of this paper is taking a novel perspective on the SCORE data by estimating the effect size of an original study and replication. Meta-analysis methods that correct for potential bias in the original study yielded close to unbiased estimates when synthesizing the original study and replication. The reanalysis of the SCORE data revealed that meta-analytic estimates were substantially closer to the effect size of the replication compared to the original study. The null hypothesis of no effect was rejected by the meta-analysis methods in 59.2% to 81.3% of the pairs of an original study and replication. We recommend researchers to make sure that methods for analyzing replicability have favorable statistical properties for the data at hand before applying these methods.