MultiLLM – Self Reflect Iterative Prompt Methodology based Automated Essay Scoring System

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Although the use of Large Language Models (LLMs) for essay scoring is not a new concept, these models do not grade in the same manner as humans. This discrepancy arises because humans can adapt their grading patterns based on the specific questions they encounter. In contrast, existing research on this topic typically employs a predefined rubric that fails to address the variability in responses effectively. There has been a noticeable lack of systematic research aimed at defining rubrics and prompts tailored to the responses considered. To address this issue and provide a structured approach to LLM-based grading, this paper suggests a new methodology. We propose the use of multiple LLMs for rubric generation and grading through a process of self-reflection and iteration. The key components of this system include: 1. Developing grading rubrics and prompt patterns that account for both the questions asked and the responses provided. 2. Employing self-reflective iteration rubrics across multiple LLMs to ensure consistent scoring for diverse responses. 3. Implementing verification and validation processes to effectively identify anomalous scores, allowing for re-evaluation and achieving consistency. Experimental evaluations demonstrate that the proposed system offers new insights into the role of LLMs in Automated Essay Scoring (AES).

Article activity feed