Potential Use of ChatGPT for Automated Essay Scoring Based

Roghaye Torki
Fariba Rahimi Esfahani
Farshad Kiyoumarsi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The rapid advancements in Artificial Intelligence (AI) have significantly influenced educational practices, particularly in writing assessment. Automated Essay Scoring (AES) systems offer a promising alternative to traditional scoring methods by enhancing consistency, efficiency, and scalability. However, the integration of AI in high-stakes assessments like IELTS Writing Task 2 requires rigorous evaluation to ensure reliability and alignment with human judgment. This study explores the potential of ChatGPT, an advanced AI language model, as a tool for scoring essays based on IELTS Writing Task 2 criteria—Task Response, Coherence and Cohesion, Lexical Resource, and Grammatical Range and Accuracy. Employing a quantitative Associational Ex Post Facto Design, 30 essays were scored by both certified human raters and ChatGPT, using intra-class correlation coefficients (ICC) for reliabilityand MANOVA for comparative accuracy. The findings reveal that while ChatGPT demonstrates high internal consistency in scoring, significant discrepancies persist when compared to human raters, particularly in Coherence and Cohesion. These results highlight both the potential and limitations of ChatGPT in AES, suggesting that it can complement, but not yet replace, human evaluators in complex writing tasks. The study contributes to the ongoing discourse on the role of AI in education, emphasizing the need for further refinements to optimize AI-assisted assessments for fairness and precision. Beyond its theoretical contributions, this study provides practical insights for language educators, testing bodies, and policymakers on how AI can be responsibly integrated into large-scale writing assessments.

Version published to 10.21203/rs.3.rs-7533498/v1 on Research Square
Sep 25, 2025

Can ChatGPT Replace the Teacher in Assessment? A Review of Research on the Use of Large Language Models in Grading and Providing Feedback

This article has 2 authors:
1. Marcin Jukiewicz
2. Michał Wyrwa
This article has no evaluationsLatest version Sep 15, 2025
Evaluating the Accuracy and Reliability of AI Content Detectors in Academic Contexts

This article has 3 authors:
1. Mohammad Hadra
2. Karleen Cambridge
3. Mostefa Mesbah
This article has no evaluationsLatest version Sep 16, 2025
Optimizing GPT-Based Distractor Generation for the Korean CSAT English Exam

This article has 2 authors:
1. Chan Young Jung
2. Sanghoun Song
This article has no evaluationsLatest version Sep 18, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Can ChatGPT Replace the Teacher in Assessment? A Review of Research on the Use of Large Language Models in Grading and Providing Feedback

Evaluating the Accuracy and Reliability of AI Content Detectors in Academic Contexts

Optimizing GPT-Based Distractor Generation for the Korean CSAT English Exam