The role of reliability in experiments

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Psychological experiments are routinely used to study covariation of individual differences in psychological processes. A key advantage of experimental tasks is that they may offer higher construct validity than traditional survey instruments. Whereas we welcome the generally growing attention to the psychometric evaluation of experimental tasks, we are concerned about an overemphasis on reliability as the principal criterion for evaluating task goodness or interpreting correlations across tasks. In this article, we present a conceptual framework that disentangles three levels of measures: foundational measures of task goodness (i.e., signal-to-noise ratios or intraclass correlations), reliability (which reflects both task goodness and the number of trials), and uncertainty in estimated correlations (which additionally depends on sample size). Within this framework, reliability takes an intermediate position – it is useful for planning but less so for communicating task goodness or interpreting correlations. Therefore, we advocate for a shift in focus: Researchers should use foundational task goodness measures to communicate the strengths and limitations of experimental designs, and report uncertainty in correlations. To that end, we highlight the role of hierarchical models to jointly estimate trial-level noise and individual variability, thus enabling more accurate and transparent inference about covariation of individual differences.

Article activity feed