Towards Transparent and Context-Aware Automated Essay Scoring

R. Johnsi
G. Bharadwaja Kumar

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The evaluation of descriptive answer scripts in examinations is both time-consuming and prone to human bias. Although advancements in Natural Language Processing (NLP) and Artificial Intelligence (AI) have enhanced Automated Essay Scoring (AES) systems, significant challenges remain, including inter-rater variability, intra-rater inconsistency, and the difficulty of assessing subjective responses. Existing AES models, developed over more than fifty years, predominantly rely on student-to-student answer comparisons, which can lead to biased outcomes such as over-scoring incorrect answers and undervaluing correct ones. In contrast, course-based examinations require student responses to align with teacher-defined expectations, ensuring fairness, efficiency, and clarity in grading. In response to this gap, we present Reference-based AES (RAES), a framework that assesses student digitalized answers (SDAessay) through direct comparison with teacher-provided reference answers. Through ablation studies, the framework demonstrates impartial and meaningful evaluation while reducing biases inherent in conventional AES approaches. The findings further reveal limitations of reference-free AES models, which can inaccurately assign high scores to incorrect responses, underscoring the need for expectation-driven evaluation in educational assessment.

Version published to 10.21203/rs.3.rs-7636459/v1 on Research Square
Nov 12, 2025

Potential Use of ChatGPT for Automated Essay Scoring Based

This article has 3 authors:
1. Roghaye Torki
2. Fariba Rahimi Esfahani
3. Farshad Kiyoumarsi
This article has no evaluationsLatest version Sep 25, 2025
Chatbot-guided Search delivers Low-Relevance News and can exacerbate Gender Gaps in Political Knowledge

This article has 2 authors:
1. Kokil Jaidka
2. Shaz Furniturewala
This article has no evaluationsLatest version Nov 2, 2025
Machines flunking an exam: Evaluating large language models on course-related open questions

This article has 6 authors:
1. Jingxiu Huang
2. Yufeng Wei
3. Lixin Zhang
4. Ruilin Lai
5. Feiyu Lai
6. Yunxiang Zheng
This article has no evaluationsLatest version Sep 30, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Potential Use of ChatGPT for Automated Essay Scoring Based

Chatbot-guided Search delivers Low-Relevance News and can exacerbate Gender Gaps in Political Knowledge

Machines flunking an exam: Evaluating large language models on course-related open questions