Theory-Prompt-Validation: A Practice-Oriented Approach to Using LLMs for Verbal Coding in the Learning Sciences
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large Language Models (LLMs) hold growing potential for scaling the analysis of qualitativedata in the learning sciences. One essential step in qualitative analysis is coding verbal data.Although several approaches have been proposed for automating coding with LLMs, fewappear well-suited to the needs of the learning sciences, where coding requires identifyingdescriptive content categories and pedagogical functions of utterances, often implicit anddifficult to detect. To address this challenge, we provide the Theory–Prompt–Validation(TPV) approach, a three-step process comprising theory modeling, prompt engineering, andvalidation. This approach provides a blueprint for evidence-based application of LLMs forcontext-sensitive coding of verbal data. The TPV approach builds on existing methods whileincorporating requirements specific to the learning sciences. We emphasize a theory-driven,evidence-based process, including validation analyses to verify that pragmatic functions areaccurately captured in coding. To illustrate the TPV approach, we applied it to AI-studenttutoring dialogues. We developed a theory-driven codebook and implemented it in a Python-based script leveraging GPT-4o to segment utterances and assign codes automatically.Intercoder agreement between the LLM and a human coder was substantial (κ = .73, 95% CI[0.69,0.76]) and descriptively higher than between two human coders (κ = .69, 95% CI [0.66,0.73]). Validity analyses revealed theoretically meaningful patterns, such as positive effectswhen the code feedback was assigned more frequently. Our work demonstrates that LLM-based coding can reliably and validly scale the analysis of verbal data, bridging the gapbetween time-intensive qualitative methods and large-scale, evidence-based educationalresearch.