Explainable AI for Education: Enhancing Essay Scoring via Rubric-Aligned Chain-of-Thought Prompting

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Automated Essay Scoring (AES) systems increasingly leverage fine-tuned large language models (LLMs) to enhance scoring accuracy and feedback generation. However, current LLM-based AES approaches often lack interpretability and consistent alignment with human scoring rubrics, limiting their practical adoption in educational settings. This study proposes QwenScore+, a novel framework that integrates rubric-aware Chain-of-Thought (CoT) prompting with reinforcement learning from human feedback (RLHF) to improve the transparency, quality, and educational alignment of automated feedback. QwenScore+ is evaluated on a proprietary IELTS writing dataset comprising over 5,000 essays annotated with trait-level scores and expert-written feedback. Experimental results demonstrate that QwenScore+ significantly outperforms strong baselines such as BERT, GPT-3.5, and GPT-4 in feedback generation, achieving higher BLEU, ROUGE-L, and cosine similarity scores, alongside improvements in trait-level scoring measured by quadratic weighted kappa (QWK). Furthermore, rubric-aligned CoT prompting enables the generation of feedback that better mirrors human reasoning patterns, as confirmed through automatic metrics and human evaluations. These findings highlight the potential of combining explainable reasoning strategies with human-aligned reward optimization to develop more transparent, reliable, and pedagogically valuable AES systems for real-world applications.

Article activity feed