Optimizing GPT-Based Distractor Generation for the Korean CSAT English Exam

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

High-quality distractors are essential in multiple-choice questions to assess student understanding and diagnose misconceptions; however, constructing these distractors manually is labor-intensive. This study presents the first large-scale investigation of automated distractor generation (ADG) for the English section of Korea’s College Scholastic Ability Test (CSAT), a high-stakes exam of English as a Foreign Language (EFL) characterized by consistent item design and linguistic constraints. We implement and evaluate three ADG approaches using GPT-4.1: supervised fine-tuning on a curated CSAT dataset, in-context learning with a novel distractor attractiveness metric to guide exemplar retrieval, and Chain-of-Scaffolds, a prompting strategy inspired by educational scaffolding theory that decomposes distractor generation into reasoning stages. Across 80 unseen items from recent CSAT administrations, supervised fine-tuning achieves the highest semantic and lexical alignment with ground-truth distractors. In-context learning retrieves more pragmatically effective examples, producing distractor sets that best approximate realistic answer distributions. The Chain-of-Scaffolds method yields distractors that simulate test-taker misconceptions while minimizing confusion with the correct answer. These findings underscore the value of pedagogically grounded prompting and data-informed retrieval in high-stakes language assessment and suggest that ADG strategies should align with instructional contexts—for example, prioritizing fine-tuning for nationwide standardized exams, or selecting in-context learning for classroom diagnostics that require adaptability and rapid deployment.

Article activity feed