Controlling the False Discovery Rate in DIF Detection With e-Values: Evidence From Multidimensional and Testlet Simulations

Shan Huang
David Goretzko

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study presents the first application of e-value–based false discovery rate (FDR) control to Differential Item Functioning (DIF) detection, addressing long-standing limitations of p -value-based approaches when model assumptions are violated—for example, under multidimensionality, local item dependence, or extreme sample sizes. Two comprehensive simulation studies were conducted to evaluate e-BH (the e-value analogue of BH) procedures, using K-fold and Multisplit likelihood-ratio e-values, under (a) multidimensional contamination and (b) testlet-based local dependence. Across both scenarios, e-BH consistently provided stronger and more stable control of Type I error, FDR, and family-wise error rate (FWER) than classical procedures such as Benjamini–Hochberg (BH) and Holm. Even under severe model misspecification, e-BH maintained substantially lower false-positive rates while remaining relatively competitive in terms of Type II error. A key finding concerns sample size: classical p -value methods exhibited inflation of Type I error as N increased, whereas e-BH preserved stable error control due to its model-agnostic calibration. An empirical application using Progress in International Reading Literacy Study (PIRLS) data further demonstrated that e-BH produces a more defensible and operationally sustainable set of DIF flags than traditional approaches. Together, these results establish e-values as a powerful and robust evidential tool for DIF detection in modern assessment contexts.

Version published to 10.1177/00131644261433236
Apr 16, 2026
Version published to 10.31234/osf.io/dfyp6_v1 on OSF Preprints
Mar 18, 2026

The Statistical Costs of Two-Step Signal Detection Analyses: A Case for a Maximum Likelihood Mixed-Effects Approach

This article has 4 authors:
1. Marie Jakob
2. Raphael Hartmann
3. Karl Christoph Klauer
4. Constantin Gregor Meyer-Grant
This article has no evaluationsLatest version Mar 12, 2026
A Comparative Evaluation of Multiple Hypothesis Testing Adjustment Methods

This article has 1 author:
1. Carl Dolling
This article has no evaluationsLatest version Apr 6, 2026
A new effect size for meta-analysis of magnitude: lnM

This article has 9 authors:
1. Shinichi Nakagawa
2. Ayumi Mizuno
3. Coralie Williams
4. Malgorzata Lagisz
5. Rose O'Dea
6. Daniel Noble
7. Alistair Senior
8. Erick Lundgren
9. Santiago Ortega
This article has no evaluationsLatest version Feb 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Statistical Costs of Two-Step Signal Detection Analyses: A Case for a Maximum Likelihood Mixed-Effects Approach

A Comparative Evaluation of Multiple Hypothesis Testing Adjustment Methods

A new effect size for meta-analysis of magnitude: lnM