AI Feedback in Education: The Impact of Prompt Design and Human Expertise on LLM Performance

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This article investigates the potential of large language models (LLMs) as tools for high-quality feedback in higher education, emphasizing the critical role of prompt design and human supervision. Addressing challenges such as time constraints for educators and variability in feedback quality, two empirical studies evaluate feedback generated by ChatGPT-4, Claude 3, and Gemini Advanced. Study 1 examines the influence of prompt structures on feedback quality, contributing robust evidence to a manual for effective prompt engineering. Study 2 compares 459 pieces of LLM-generated feedback on learning goals from 153 pre-service teachers across nine quality dimensions. Findings reveal that domain specificity and clarity in criteria within prompts significantly enhance feedback quality, with ChatGPT-4 outperforming in all categories of feedback quality, except errors. While Claude 3 demonstrates minimal content errors, Gemini Advanced provides balanced but lower-quality feedback. These results underscore that prompt engineering is a learnable skill for educators and students, aligning with the call for AI literacy in education. By combining expertly crafted prompts with human oversight, this research provides a framework to address the feedback challenges in higher education.

Article activity feed