Testing Standards for AI-based Scores in Automated Essay Scoring

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Recent developments in computer science and in particular in the field of artificial intelligence and machine learning allow the wide application of large language models for the evaluation of written text and other non-numerical data. When applied in the context of psychological and educational assessments, such models can be used for assigning scores to essays and other types of responses. In contrast to classical tests, essays do not consist of test items, which leads to specific challenges in the evaluation of testing standards for scores obtained from AI models that differ from those observed for classical ability tests and personality questionnaires. To address these challenges, we discuss the evaluation of validity, fairness, and reliability for scores obtained from models of artificial intelligence in the context of automated essay scoring. This discussion includes a review of the methods developed so far and the proposal of new methods. We further illustrate the proposed methods with an empirical example. The development of additional methods is suggested.

Article activity feed