Can ChatGPT Replace the Teacher in Assessment? A Review of Research on the Use of Large Language Models in Grading and Providing Feedback

Marcin Jukiewicz
Michał Wyrwa

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This article presents a systematic review of empirical research on the use of large language models (LLMs) for automated grading of student work and providing feedback. The study aimed to determine the extent to which generative artificial intelligence models, such as ChatGPT, can replace teachers in the assessment process. The review was conducted in accordance with PRISMA guidelines and predefined inclusion criteria; ultimately, 42 empirical studies were included in the analysis. The results of the review indicate that the effectiveness of LLMs in grading is varied. These models perform well on closed-ended tasks and short-answer questions, often achieving accuracy comparable to human evaluators. However, they struggle with assessing complex, open-ended, or subjective assignments that require in-depth analysis or creativity. The quality of the prompts provided to the model and the use of detailed scoring rubrics significantly influence the accuracy and consistency of grades generated by LLMs. The findings suggest that LLMs can support teachers by accelerating the grading process and delivering rapid feedback at scale, but they cannot fully replace human judgment. The highest effectiveness is achieved in hybrid assessment systems that combine AI-driven automatic grading with teacher oversight and verification.

Version published to 10.20944/preprints202509.1233.v1
Sep 15, 2025

Advancing Formative Feedback for EFL Learners: The Advantage of Large Language Models Over Traditional Automated Writing Evaluation Systems

This article has 2 authors:
1. Da-Wei Zhang
2. Xinyu Hong
This article has no evaluationsLatest version Aug 8, 2025
A Mixed-Methods Study of The Impact of ChatGPT on EFL Students’ Argumentative Writing Performance

This article has 1 author:
1. Mai Thị Hiền
This article has no evaluationsLatest version Sep 8, 2025
The Evaluation of Generated Responses by ChatGPT to Complex Linguistics Related Questions

This article has 1 author:
1. Hadis Habibi
This article has no evaluationsLatest version Aug 11, 2025

Listed in

Abstract

Article activity feed

Related articles

Advancing Formative Feedback for EFL Learners: The Advantage of Large Language Models Over Traditional Automated Writing Evaluation Systems

A Mixed-Methods Study of The Impact of ChatGPT on EFL Students’ Argumentative Writing Performance

The Evaluation of Generated Responses by ChatGPT to Complex Linguistics Related Questions