Clarifying the reliability paradox: poor test-retest reliability attenuates group differences

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Cognitive sciences are grappling with the reliability paradox: measures that robustly produce within-group effects tend to have low test-retest reliability, rendering them unsuitable for studying individual differences. Despite the growing awareness of this paradox, its full extent remains underappreciated. Specifically, most research focuses exclusively on how reliability affects correlational analyses of individual differences, while largely ignoring its effects on studying group differences. Moreover, some studies explicitly and erroneously suggest that poor reliability does not pose problems for studying group differences, possibly due to conflating within- and between-group effects. In this short report, we aim to clarify this misunderstanding. Using both data simulations and mathematical derivations, we show how observed group differences get attenuated by measurement reliability. We consider multiple scenarios, including when groups are created based on thresholding a continuous measure (e.g., patients vs. controls or median split), when groups are defined exogenously (e.g., treatment vs. control groups, or male vs. female), and how the observed effect sizes are further affected by differences in measurement reliability and between-subject variance between the groups. Overall, we show that just as for correlational strength, observed standardized group differences are attenuated as a function of sqrt{reliability} for each measure. This has important implications for biomarker discovery, clinical translation, and other areas of group differences research that inform policy and real-world applications.

Article activity feed