Exploring the Quality and Effectiveness of AI-Generated Feedback in Introductory Programming
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Feedback is a vital but often difficult part of introductory programming courses, where standard compiler messages are vague and confusing for students. Generative artificial intelligence (GenAI) has become a promising tool for providing improved feedback in programming education, yet empirical studies on its effectiveness in real educational settings are limited. Using a design-based research approach, this study examined both the quality and instructional impact of AI-generated feedback in an introductory Python programming course. Two cohorts of undergraduate students were participants. The first cohort received data-driven feedback (DDF Group, n = 28), and the second cohort, who used an upgraded automated assessment tool, received AI-generated feedback generated by the Llama 3-8B model (AIF Group, n = 32). Quality was assessed through expert ratings using a 0–5 point rubric and student perception surveys. Effectiveness was evaluated through debugging performance metrics and final exam scores. Expert evaluation of 1,490 AI-generated feedback messages revealed concerning quality issues, with a mean rating of 1.84 out of 5 and over 40% receiving the lowest possible score. Common quality issues included excessive redundant information, content exceeding students’ knowledge scope, and misleading explanations. Students reported significantly lower perceived usefulness for AI-generated feedback compared to data-driven feedback. The AIF Group also exhibited poorer debugging performance and achieved lower final programming exam scores. Contrary to expectations, AI-generated feedback was less effective than data-driven feedback in supporting student learning. This study highlights the need for rigorous design, prompt refinement, and contextual alignment when deploying GenAI tools in educational contexts.