Exploring the Quality and Effectiveness of AI-Generated Feedback in Introductory Programming

Yizhou Qian
Meishan Liu
Liye Zhu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Feedback is a vital but often difficult part of introductory programming courses, where standard compiler messages are vague and confusing for students. Generative artificial intelligence (GenAI) has become a promising tool for providing improved feedback in programming education, yet empirical studies on its effectiveness in real educational settings are limited. Using a design-based research approach, this study examined both the quality and instructional impact of AI-generated feedback in an introductory Python programming course. Two cohorts of undergraduate students were participants. The first cohort received data-driven feedback (DDF Group, n = 28), and the second cohort, who used an upgraded automated assessment tool, received AI-generated feedback generated by the Llama 3-8B model (AIF Group, n = 32). Quality was assessed through expert ratings using a 0–5 point rubric and student perception surveys. Effectiveness was evaluated through debugging performance metrics and final exam scores. Expert evaluation of 1,490 AI-generated feedback messages revealed concerning quality issues, with a mean rating of 1.84 out of 5 and over 40% receiving the lowest possible score. Common quality issues included excessive redundant information, content exceeding students’ knowledge scope, and misleading explanations. Students reported significantly lower perceived usefulness for AI-generated feedback compared to data-driven feedback. The AIF Group also exhibited poorer debugging performance and achieved lower final programming exam scores. Contrary to expectations, AI-generated feedback was less effective than data-driven feedback in supporting student learning. This study highlights the need for rigorous design, prompt refinement, and contextual alignment when deploying GenAI tools in educational contexts.

Version published to 10.21203/rs.3.rs-7269019/v1 on Research Square
Oct 1, 2025

Transforming Learning or Empty Promise? A Meta-Analysis of Generative AI in Education

This article has 4 authors:
1. Xiuxiu Tang
2. Xiyu Wang
3. Liu Dong
4. Jingxian Cecilia Zhang
This article has no evaluationsLatest version Sep 26, 2025
Passing the Turing Test: Fine-tuned AI feedback is less detectable than human or prompt-engineered feedback

This article has 2 authors:
1. Peter Ruijten-Dodoiu
2. Manuel Oliveira
This article has no evaluationsLatest version Sep 3, 2025
Generative AI’s Influence in Computer Science Classrooms: A Rapid Review Methodology

This article has 6 authors:
1. Michael Pin-Chuan Lin
2. Gaganpreet Jhajj
3. Fuhua Lin
4. Eric Poitras
5. Daniel Chang
6. Jeeho Ryoo
This article has no evaluationsLatest version Sep 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Transforming Learning or Empty Promise? A Meta-Analysis of Generative AI in Education

Passing the Turing Test: Fine-tuned AI feedback is less detectable than human or prompt-engineered feedback

Generative AI’s Influence in Computer Science Classrooms: A Rapid Review Methodology