Anchor Is the Key: Toward Accessible Automated Essay Scoring with Large Language Models Through Prompting

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Automated Essay Scoring (AES) offers a scalable solution to the time-intensive and inconsistent nature of human scoring. While traditional AES systems require large sets of prompt-specific scored essays, large language models (LLMs) provide a powerful, adaptable alternative, capable of evaluating essays holistically without an extensive amount of pre-scored essays. However, most research on LLM-based AES focuses on resource-intensive optimization methods that are impractical for educators. In this study, we examine prompting – the most practical and accessible way for teachers to interact with LLMs – and its impact on holistic essay scoring. Using argumentative essays from secondary school students, we evaluate the effectiveness of incorporating grading rubrics, source materials, and anchor papers into prompts. Our results show that providing anchor papers significantly improves LLM-human agreement, bringing it closer to human-human scoring reliability. Moreover, while GPT-4o outperforms other models, GPT-4o mini achieves comparable results at a substantially lower cost. These findings highlight the potential of structured prompting strategies in enhancing the accuracy and accessibility of LLM-based AES in education.

Article activity feed