Differences in score reliability do not explain meta-analytic heterogeneity in standardised effect sizes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Heterogeneity in standardised effect sizes, such as Cohen’s d, provides evidence that a phenomenon varies across contexts or populations. One influential methodological explanation for such variation is that it arises from differences in measurement precision across studies. Such an explanation implies that correcting effect sizes for attenuation due to score unreliability should reduce between-study variability. We reanalyse data on a wide range of phenomena from large-scale collaborative projects, including the ManyLabs studies, Registered Replication Reports, and the Psychological Science Accelerator. Applying conventional attenuation correction procedures we find that differences in score reliability are small across data sets and can not explain effect size heterogeneity. Contrary to prevailing claims in the methodological literature on research synthesis, rather than decreasing heterogeneity, attenuation correction procedures increased heterogeneity for most phenomena. Therefore, we propose an alternative framework that treats standardised effect sizes as ratio variables to better understand how differences in score reliability relate to heterogeneity. Based on this framework we identify conditions under which attenuation procedures will necessarily increase effect size heterogeneity. Our re-analyses imply that between-study differences in underlying true score variances dominate the comparatively small variation in random measurement error. Such findings suggest that unreliability is unlikely to be a major driver of effect size heterogeneity for phenomena comparable to the ones reanalysed here. More broadly, the results suggest that meta-analysts should not expect that effect size heterogeneity can be reduced or explained by controlling for differences in measurement precision.