Is the Replication Crisis a Measurement Crisis? Evidence from Over 100 Randomized Trial Outcomes

Joshua Gilbert
James Soland
William Young

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Randomized controlled trials (RCTs) are the gold standard for causal inference, yet the validity of RCT conclusions depends not only on randomization but also on how outcomes are measured and scored. We analyze item-level data from 112 RCT outcome measures spanning psychology, medicine, public health, and education to test whether foundational measurement assumptions are met and whether alternative scoring approaches alter conclusions. Across disciplines, measurement assumptions are rarely evaluated and frequently violated. For nearly half of outcomes, the single score used is not likely a plausible representation of the data. Moreover, when outcomes are scored using statistical models aligned with the study design and better matching the data instead of sum scores, the proportion of pre/post trials reporting statistically significant treatment effects approximately doubles. These findings indicate that routine measurement decisions can systematically shift causal inferences, suggesting that a largely unexamined aspect of RCT analysis may be at the heart of replication failures.

Version published to 10.31234/osf.io/9zgj8_v1 on OSF Preprints
Mar 21, 2026

Evaluation of replication success: A statistical perspective and extended analysis of the SCORE data

This article has 3 authors:
1. Robbie Cornelis Maria van Aert
2. Marcel A. L. M. van Assen
3. Ruben Lopez-Nicolas
This article has no evaluationsLatest version Apr 4, 2026
How credible is the experimental evidence on precarious manhood? A z-curve analysis

This article has 2 authors:
1. Vladislav Krivoshchekov
2. Olga Gulevich
This article has no evaluationsLatest version Apr 9, 2026
Defining and Estimating Causal Effects in Randomized Alternating Treatment Design for Single-Case Experiments: A Counterfactual Approach

This article has 2 authors:
1. Wen Luo
2. Chendong Li
This article has no evaluationsLatest version Mar 31, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Evaluation of replication success: A statistical perspective and extended analysis of the SCORE data

How credible is the experimental evidence on precarious manhood? A z-curve analysis

Defining and Estimating Causal Effects in Randomized Alternating Treatment Design for Single-Case Experiments: A Counterfactual Approach