Score-Based Tests with Fixed Effects Person Parameters in Item Response Theory: Detecting Model Misspecification Including Differential Item Functioning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We present a fast, score-based test to detecting model misspecification in item response theory (IRT) models, that remains valid when person parameters are treated as fixed effects, as may be used for very large data sets. The new approximation (i) eliminates the need to pre-specify ability groups or priors for person abilities, (ii) does not require explicit functional form assumptions, (iii) works with two estimators designed for very high item / person counts -- constrained joint maximum likelihood (CJML) and joint maximum a posteriori (JMAP) -- and (iv) requires only a single model fit, making DIF-screening faster and simpler than alternatives based on model comparisons. A spline-based residualisation step further suppresses spurious Type I error when the ordering covariate is correlated with ability. Simulations with the two-parameter logistic model show nominal error rates and high power once examinees contribute 15–20 responses; only extremely short tests (10 items) still pose challenges under strong impact. An application to 1602 reading items and 57684 students from the Mindsteps platform demonstrates scalability and practical value, flagging 13% of items for gender-related DIF and correlating highly with conventional approaches of explicitly modelling DIF. Together, these results position the proposed test as a robust, computation-light diagnostic for large-scale assessments when classical random-effects approaches are infeasible, ability group structure is unknown or complex, or the shape of DIF effects is unknown or complex.