Optimizing GPT-Based Distractor Generation for the Korean CSAT English Exam
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
High-quality distractors are essential in multiple-choice questions to assess student understanding and diagnose misconceptions; however, constructing these distractors manually is labor-intensive. This study presents the first large-scale investigation of automated distractor generation (ADG) for the English section of Korea’s College Scholastic Ability Test (CSAT), a high-stakes exam of English as a Foreign Language (EFL) characterized by consistent item design and linguistic constraints. We implement and evaluate three ADG approaches using GPT-4.1: supervised fine-tuning on a curated CSAT dataset, in-context learning with a novel distractor attractiveness metric to guide exemplar retrieval, and Chain-of-Scaffolds, a prompting strategy inspired by educational scaffolding theory that decomposes distractor generation into reasoning stages. Across 80 unseen items from recent CSAT administrations, supervised fine-tuning achieves the highest semantic and lexical alignment with ground-truth distractors. In-context learning retrieves more pragmatically effective examples, producing distractor sets that best approximate realistic answer distributions. The Chain-of-Scaffolds method yields distractors that simulate test-taker misconceptions while minimizing confusion with the correct answer. These findings underscore the value of pedagogically grounded prompting and data-informed retrieval in high-stakes language assessment and suggest that ADG strategies should align with instructional contexts—for example, prioritizing fine-tuning for nationwide standardized exams, or selecting in-context learning for classroom diagnostics that require adaptability and rapid deployment.