Benefit or Bottleneck? Assessing the Impact of Structured Reflection on Learning from AI-Driven Explanatory Feedback

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

As AI tutors become increasingly capable of delivering rich, personalized feedback at scale, a key challenge remains: novice learners often struggle to process detailed explanations on their own. Structured reflection, grounded in decades of self-explanation research, is a theoretically compelling solution. By helping learners parse feedback and prompting them to actively interpret it, reflection activities are designed to reduce cognitive overload and deepen understanding. But does adding reflection to already rich AI-generated feedback actually help, or does it simply add friction? We tested this in a randomized experiment comparing Python practice with AI-generated, personalized feedback to "reflective practice," which paired identical feedback with structured self-explanation prompts. Contrary to our predictions, reflection never improved performance on any measure. Instead, it proved to be a temporal bottleneck: it doubled time spent on feedback and reduced practice volume by 40%, without making each learning opportunity more effective. Learners who cycled through more practice-and-feedback iterations outperformed reflective learners at the end of the session and maintained a small, nonsignificant advantage on transfer. Notably, reflection did not provide the scaffolding benefit we predicted for novices—and when individual differences did emerge, they favored higher-volume practice for more knowledgeable learners. Both practice conditions also substantially outperformed a high-quality video baseline (d = 0.66-0.93), replicating benefits of active practice with AI feedback over passive instruction. These findings suggest that when AI feedback is already elaborated and personalized, self-explanation activities may be a redundant time sink. As AI-generated feedback reaches learners at scale, these findings underscore the necessity of empirically validating pedagogical scaffolds—even those with strong theoretical support—before deploying them broadly.

Article activity feed