Stacked Ensemble Learning for Content-Based Item Difficulty Prediction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Content-based prediction of item parameters has potential value for supporting item development and calibration-related tasks, particularly when operational calibration is costly or slow. The present study evaluated a three-stage stacked ensemble framework for predicting item difficulty from item content using 1,079 verbal reasoning items from a higher-education admission test. In Stage 1, a large language model coded 10 theoretically motivated content features, which were then used in a random forest as predictors. In Stage 2, a transformer encoder (DeBERTa-v3-base) was fine-tuned to predict difficulty directly from raw item text. In Stage 3, a ridge regression meta-learner combined Stage 1 and Stage 2 predictions. Performance was evaluated across five random train-test splits using Pearson correlation and root mean squared error (RMSE). The feature-based model outperformed the standalone text-based model on held-out data (r = .314 vs. .273), suggesting that structured, cognitively oriented features were more informative than encoder-only text representations in this dataset. The stacked model yielded the highest test-set correlation (r = .354, RMSE = 0.743), indicating modest improvement over either base learner alone and supporting the view that the two approaches captured partially complementary information. Feature-importance analyses indicated that reasoning steps, task type, and option complexity were the strongest unique predictors. Although the observed level of accuracy was insufficient for standalone operational use, the findings suggest that item content contains recoverable information about difficulty and that integrating interpretable feature-based and text-based representations is a promising direction for supporting calibration workflows.

Article activity feed