Rethinking Item Fairness Using Single World Intervention Graphs
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Since the 1960s, the testing community has striven to ensure fairness in tests and test items. Differential item functioning (DIF) is a widely used statistical notion for items that may unfairly disadvantage specific subgroups of test-takers. However, traditional DIF analyses focus only on statistical relationships in observed data and cannot explain why such unfairness occurs. To fill this gap, we introduce a novel causal framework for defining and detecting unfair items using single world intervention graphs (SWIGs). By leveraging SWIGs and potential outcomes, we define causal DIF (CDIF) as the difference in item functioning between two hypothetical worlds: one where individuals were assigned to one group and another where they were assigned to a different group. We also connect CDIF to related fairness concepts, including group versus individual fairness and item impact. In particular, we use SWIGs to graphically distinguish between item fairness at the individual level and the population level. Additionally, we discuss causal identification strategies using SWIGs and illustrate how our approach can be applied to a controversial item from New York's Regents math exam. We further demonstrate how it differs from traditional DIF methods through a simulation study and conclude with the broader implications of promoting causal fairness in testing practices.