Differences in score reliability do not explain meta-analytic heterogeneity in standardised effect sizes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Heterogeneity in standardised effect sizes, such as Cohen’s d, is informative because it indicates that a psychological phenomenon is not yet fully understood. Prior work by Hunter and Schmidt as well as Wiernik and Dahlke has argued that such heterogeneity partly reflects differences in measurement precision and that correcting for these differences using attenuation correction should therefore reduce heterogeneity. We reanalyze data from large-scale collaborative projects, including the ManyLabs studies, Registered Replication Reports, and the Psychological Science Accelerator, and demonstrate that this assumption does not hold universally. Treating standardised effect sizes as ratio variables, we offer an alternative account of how differences in measurement precision relate to observed heterogeneity. Across many psychological research contexts, score reliability tends to be high and measurement error relatively stable. Under these conditions, when samples differ substantially in their composition or diversity, attenuation correction procedures are more likely to increase rather than decrease heterogeneity. We conclude by discussing the limited relevance of random measurement error for explaining effect size heterogeneity in the context of convenience sampling.