Testing Whether Reported Treatment Effects Are Unduly Influenced by Item-Level Heterogeneity
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper addresses the situation in which treatment effects are reported using educational or psychological outcome measures comprised of multiple questions or “items.” Drawing from item response theory, we distinguish among three estimands of potential interest: (a) a treatment effect on the latent variable representing the construct of interest, which is referred to as impact, (b) test-level treatment effects computed using aggregates of assessment items (e.g., the unweighted mean), and (c) item-specific effects. We show that test-level treatment effects and impact are generally not equivalent estimands in the presence of item-level treatment effect heterogeneity. Consequently, failing to distinguish these estimands can have important implications for the validity of research studies. To address this issue, we propose a diagnostic test to infer whether estimated treatment effects based on the unweighted mean of assessment items are a suitable proxy for impact on the latent trait. We illustrate the use of the test with a case study. We also provide some initial evidence about the prevalence of this issue using small meta-analysis. Results from the meta-analysis indicated that treatment effects based on the unweighted mean over assessment items often over-estimated impact on the latent trait, and that this pattern was more apparent with researcher-developed assessments than independently developed assessments.