The role of reliability in experiments

Jeffrey N. Rouder
Mahbod Mehrvarz
Martin Schnuerch

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Psychological experiments are routinely used to study covariation of individual differences in psychological processes. A key advantage of experimental tasks is that they may offer higher construct validity than traditional survey instruments. Whereas we welcome the generally growing attention to the psychometric evaluation of experimental tasks, we are concerned about an overemphasis on reliability as the principal criterion for evaluating task goodness or interpreting correlations across tasks. In this article, we present a conceptual framework that disentangles three levels of measures: foundational measures of task goodness (i.e., signal-to-noise ratios or intraclass correlations), reliability (which reflects both task goodness and the number of trials), and uncertainty in estimated correlations (which additionally depends on sample size). Within this framework, reliability takes an intermediate position – it is useful for planning but less so for communicating task goodness or interpreting correlations. Therefore, we advocate for a shift in focus: Researchers should use foundational task goodness measures to communicate the strengths and limitations of experimental designs, and report uncertainty in correlations. To that end, we highlight the role of hierarchical models to jointly estimate trial-level noise and individual variability, thus enabling more accurate and transparent inference about covariation of individual differences.

Version published to 10.31234/osf.io/xfpmb_v1 on OSF Preprints
Jun 1, 2025

What Pilot Studies Can (and Cannot) Do for Validity in Psychological Research

This article has 16 authors:
1. Yashvin Seetahul
2. Mahmoud Medhat Elsherif
3. Caroline Zygar-Hoffmann
4. Lukas Wallrich
5. Priya Silverstein
6. Bjørn Sætrevik
7. Ilse L. Pit
8. Hannah Loenneker
9. Neele Henriette Heiser
10. Isaac J Handley-Miner
11. Christopher James Graham
12. Yu Yang Chou
13. Brett Buttliere
14. Agata Bochynska
15. Julia Beitner
16. Mary Beth Neff
This article has no evaluationsLatest version Feb 15, 2026
Reliability and statistical power: Conceptual background and practical implications

This article has 1 author:
1. Attila Krajcsi
This article has no evaluationsLatest version Mar 17, 2026
Does Trial Selection Improve the Reliability and Validity of Attentional-Control Measures? A Simulation and Systematic Reanalysis of Existing Datasets [Stage 1 Registered Report]

This article has 4 authors:
1. Niels Oliver Kempkens
2. Julia M. Haaf
3. Anna-Lena Schubert
4. Alodie Rey-Mermet
This article has no evaluationsLatest version Feb 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

What Pilot Studies Can (and Cannot) Do for Validity in Psychological Research

Reliability and statistical power: Conceptual background and practical implications

Does Trial Selection Improve the Reliability and Validity of Attentional-Control Measures? A Simulation and Systematic Reanalysis of Existing Datasets [Stage 1 Registered Report]