MultiLLM – Self Reflect Iterative Prompt Methodology based Automated Essay Scoring System

R. Johnsi
G. Bharadwaja Kumar

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Although the use of Large Language Models (LLMs) for essay scoring is not a new concept, these models do not grade in the same manner as humans. This discrepancy arises because humans can adapt their grading patterns based on the specific questions they encounter. In contrast, existing research on this topic typically employs a predefined rubric that fails to address the variability in responses effectively. There has been a noticeable lack of systematic research aimed at defining rubrics and prompts tailored to the responses considered. To address this issue and provide a structured approach to LLM-based grading, this paper suggests a new methodology. We propose the use of multiple LLMs for rubric generation and grading through a process of self-reflection and iteration. The key components of this system include: 1. Developing grading rubrics and prompt patterns that account for both the questions asked and the responses provided. 2. Employing self-reflective iteration rubrics across multiple LLMs to ensure consistent scoring for diverse responses. 3. Implementing verification and validation processes to effectively identify anomalous scores, allowing for re-evaluation and achieving consistency. Experimental evaluations demonstrate that the proposed system offers new insights into the role of LLMs in Automated Essay Scoring (AES).

Version published to 10.21203/rs.3.rs-6619776/v1 on Research Square
Jun 18, 2025

Human Researchers are Superior to Large Language Models in Writing a Systematic Review in a Comparative Multitask Assessment

This article has 8 authors:
1. Martina Sollini
2. Cristiano Pini
3. Alexandra Lazar
4. Fabrizia Gelardi
5. Gaia Ninatti
6. Matteo Bauckneht
7. Arturo Chiti
8. Margarita Kirienko
This article has no evaluationsLatest version Jun 13, 2025
Exploring the Role of Translation Brief Elements in Prompts to Large Language Models

This article has 1 author:
1. Hala Sharkas
This article has no evaluationsLatest version Jun 5, 2025
Non-Nativeness in AI-Generated Writing: How Credible Is ChatGPT’s Output to ESL Assessors?

This article has 2 authors:
1. Noboru Sakai
2. Shinichi Hashimoto
This article has no evaluationsLatest version Jun 18, 2025

Listed in

Abstract

Article activity feed

Related articles

Human Researchers are Superior to Large Language Models in Writing a Systematic Review in a Comparative Multitask Assessment

Exploring the Role of Translation Brief Elements in Prompts to Large Language Models

Non-Nativeness in AI-Generated Writing: How Credible Is ChatGPT’s Output to ESL Assessors?