Comparative judgement without the fancy statistics
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Comparative judgement methods for assessment are increasingly popular. They involve assessors making comparisons about the ‘quality’ of pairs of students’ work, and the comparisons are statistically modelled to produce scores. Recently Benton and Gallacher (2018, p.25) claimed that “much of the apparent advantage of [comparative judgement] can be explained by its use of fancy statistics”. They evidenced this by applying ‘fancy statistics’ to raw scores from multiple marked essays, and comparing the predictive value of raw scores with fancy statistics outcomes. Here I take the inverse approach and compare raw scores from comparative judgement assessments with fancy statistics outcomes. I reanalysed studies from peer-reviewed outlets where the prominent measure was based on comparative judgement. I report that raw scores reduced the reliability and validity of outcomes relative to fancy statistics in about one fifth of cases. I consider the implications of the findings for using comparative judgement in educational research.