Differential Item Functioning analysis in SABER 11. A case study for DIF in large-scale assessments

Victor Hernando Cervantes
Alexander Calderon
Nelson Rodriguez

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

For every test, evidence of its validity should be examined. Absence of Differential Item Functioning (DIF) is an important piece of evidence to support the validity of group comparisons of test results. In this paper, we examine DIF between two test forms of the Mathematics test of SABER 11, a Large Scale Assessment (LSA) in Colombia. We illustrate how to tailor the process for identifying DIF following the set of guiding questions proposed by Sireci and Rios (2013, Educational Research and Evaluation, 19(2-3), 170–187) and giving answers to each of them for the analyzed test. Additionally, we present the results of a set of three simulation studies conducted to investigate the performance of the non-compensatory DIF (NCDIF) index under large sample sizes and large sample size ratios (up to 1 : 25), as well as the performance of the effect size measure guidelines (Wright and Oshima, 2015, Educational and Psychological Measurement, 75(2), 338-358) under these conditions. These simulation studies were completed due to a gap in the literature for this DIF index that obstructed the decisions required to complete the analyses of the SABER 11 tests. The results of the simulation studies allowed us to made the corresponding choices about the sample sizes to use in the analysis of SABER 11 real data and the inclusion of the effect size as part of the detection procedure. The results from the simulation studies also enlighten the performance of the NCDIF index more generally across several conditions applicable, not only to SABER 11, but possibly to other LSAs. Lastly, the results of the simulation studies also suggest that simulation studies examining the performance of NCDIF, and possibly any DIF statistic, should implement realistic item parameter pools and not only sanitized well distributed sets of item parameters.

Version published to 10.31234/osf.io/am73w on OSF Preprints
Mar 26, 2024

On the Robustness of Statistical Results to Data Exclusion in t-Tests

This article has 6 authors:
1. Tsz Keung Wong
2. Robbie Cornelis Maria van Aert
3. Jelte M. Wicherts
4. Lieke Voncken
5. Mark Verschoor
6. Marcel A. L. M. van Assen
This article has no evaluationsLatest version Jan 19, 2026
Application of Item Response Theory (IRT) to the GHQ-12 in Spanish University Students

This article has 4 authors:
1. Sergio Navas-León
2. Rodrigo Schames Kreitchmann
3. Francisco Pablo Holgado-Tello
4. Diego Díaz-Milanés
This article has no evaluationsLatest version Dec 29, 2025
Methodological Flexibility in the Iowa Gambling Task Undermines Interpretability: A Meta-method Review

This article has 4 authors:
1. Annika Iris Külpmann
2. Jan-Paul Ries
3. Ian Hussey
4. Malte Elson
This article has no evaluationsLatest version Jan 24, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

On the Robustness of Statistical Results to Data Exclusion in t-Tests

Application of Item Response Theory (IRT) to the GHQ-12 in Spanish University Students

Methodological Flexibility in the Iowa Gambling Task Undermines Interpretability: A Meta-method Review