Leveraging Knowledge Profiles and Generative AI for Realistic Student Response Generation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The scarcity of high-quality labeled data often hinders the effective use of automated formative feedback in education. While analytic rubrics offer a reliable framework for automated grading, training robust models still requires hundreds of expert-labeled responses, an expensive and time-consuming process. This paper proposes a methodology for generating diverse, rubric-aligned synthetic student responses using large language models (LLMs) guided by knowledge profiles and representative examples. We introduce two profile-based generation strategies, straightforward and error-informed, and evaluate them compared to a dataset (N = 585) of authentic open-ended logical-proof responses from a Discrete Mathematics course. We analyze the diversity and realism of the generated datasets using embedding-based distance metrics and PCA and assess their utility for training automated grading models. Our results show that synthetic responses are less diverse than authentic ones, and models trained solely on generated data perform worse than those trained on real data. However, combining small authentic datasets with generated data significantly improves model performance, suggesting it is an effective augmentation strategy in low-resource educational settings.

Article activity feed