Optimizing GPT-Based Distractor Generation for the Korean CSAT English Exam

Chan Young Jung
Sanghoun Song

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

High-quality distractors are essential in multiple-choice questions to assess student understanding and diagnose misconceptions; however, constructing these distractors manually is labor-intensive. This study presents the first large-scale investigation of automated distractor generation (ADG) for the English section of Korea’s College Scholastic Ability Test (CSAT), a high-stakes exam of English as a Foreign Language (EFL) characterized by consistent item design and linguistic constraints. We implement and evaluate three ADG approaches using GPT-4.1: supervised fine-tuning on a curated CSAT dataset, in-context learning with a novel distractor attractiveness metric to guide exemplar retrieval, and Chain-of-Scaffolds, a prompting strategy inspired by educational scaffolding theory that decomposes distractor generation into reasoning stages. Across 80 unseen items from recent CSAT administrations, supervised fine-tuning achieves the highest semantic and lexical alignment with ground-truth distractors. In-context learning retrieves more pragmatically effective examples, producing distractor sets that best approximate realistic answer distributions. The Chain-of-Scaffolds method yields distractors that simulate test-taker misconceptions while minimizing confusion with the correct answer. These findings underscore the value of pedagogically grounded prompting and data-informed retrieval in high-stakes language assessment and suggest that ADG strategies should align with instructional contexts—for example, prioritizing fine-tuning for nationwide standardized exams, or selecting in-context learning for classroom diagnostics that require adaptability and rapid deployment.

Version published to 10.21203/rs.3.rs-6680435/v1 on Research Square
Sep 18, 2025

Potential Use of ChatGPT for Automated Essay Scoring Based

This article has 3 authors:
1. Roghaye Torki
2. Fariba Rahimi Esfahani
3. Farshad Kiyoumarsi
This article has no evaluationsLatest version Sep 25, 2025
Generating multiple-choice items for a B2 English reading test with GPT-4: targeting higher-order cognitive processing

This article has 1 author:
1. Olena Rossi
This article has no evaluationsLatest version Oct 17, 2025
Vibe Coding in Vernacular Contexts: A Comprehensive Study on Tamil and Global Implications for Multilingual Programming Education

This article has 2 authors:
1. N. R. Chilambarasan
2. K. Naresh Kumar
This article has no evaluationsLatest version Sep 12, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Potential Use of ChatGPT for Automated Essay Scoring Based

Generating multiple-choice items for a B2 English reading test with GPT-4: targeting higher-order cognitive processing

Vibe Coding in Vernacular Contexts: A Comprehensive Study on Tamil and Global Implications for Multilingual Programming Education