Modeling Item Difficulty in Large-Scale Game-Based Language Assessment
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
There is a long and successful tradition of incorporating games into cognitive ability assessment, with recent research exploring the potential of online games as engaging measures of cognitive abilities. Building on this, we investigated the popular word puzzle game Wordle as a possible measure that integrates a blend of contextualized verbal abilities (lexical knowledge, orthographic processing, strategic word retrieval) and decontextualized reasoning processes (hypothesis testing, pattern recognition). Specifically, we examined the extent to which item difficulty in the German adaptation GridWords can be predicted using linguistic features. The database comprises a sample of 872 GridWords collected from 76,095 players over two years. Moving beyond prior approaches limited to narrow sets of predictors, we integrated several features from psycholinguistic research on word recognition, such as word usage and letter frequencies, orthographic neighborhood size, or letter repetitions. Using contemporary machine learning methods in a nested cross-validation approach, we found an out-of-sample performance of R² = 53% (Elastic Net Regression) and R² = 59% (Gradient Boosting Machines) in item difficulty prediction. We discuss feature importances as indicators of the underlying cognitive processes during gameplay, as well as the strengths and limitations of using a game-based approach in ability assessment.