Results from Randomized Controlled Trials are Highly Sensitive to Data Preprocessing Decisions: A Multiverse Analysis of 97 Outcomes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The sensitivity of statistical results to data preprocessing decisions is well known, but the influence of preprocessing on empirical data from randomized controlled trials (RCTs) is less well quantified. We examine preprocessing sensitivity across 97 outcomes drawn from behavioral RCTs in the Item Response Warehouse. For each outcome, we apply a 4 × 3 × 3 grid of defensible preprocessing decisions (outlier handling, missing-data handling, and outcome transformation) and estimate the average treatment effect (ATE) in a fixed ANCOVA-style linear model. Across outcomes, the median within-outcome standard deviation (SD) of the ATE across pipelines is 0.045 and the median range is 0.116 (SD min–max: 0.005–0.355; range min–max: 0.013–0.787). Because one transformation option is standardization (an affine rescaling), we distinguish transformation-driven variation that is mechanical (unit changes) from non-mechanical variation due to nonlinear re-expression (shifted log transformation). In coefficient-scale decompositions, the transformation factor explains most of the dispersion in reported ATE magnitudes (median η²_transform = 0.811), but scale-free robustness analyses show a different pattern: decomposing variability in the treatment t-statistic (invariant to affine rescaling) reveals that outlier handling dominates changes in statistical evidence (median η²_outlier = 0.570) while the transformation component is small (median η²_transform = 0.014). A within-pipeline standardized effect-size decomposition (computed after all preprocessing steps, including the log branch) yields a similar pattern (median η²_outlier = 0.599; median η²_transform = 0.020). A linear-only, fixed-denominator effect-size analysis confirms that once unit changes are removed, remaining variability is driven primarily by outlier handling (median η²_outlier = 0.858), with missing-data handling contributing in some outcomes (median η²_missing = 0.094). These results imply that preprocessing decisions should be treated as an explicit component of the analytic model and routinely reported and stress-tested via multiverse-style sensitivity analyses.