A Taylorizable Process for Textual Detector Development

Ryan Shaun Baker
Caitlin Mills
Caitlin Mills
Andrew Lan
Amanda Barany

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Textual detectors of cognitive constructs play a central role in educational data mining, learning analytics, and automated assessment, enabling scalable analysis of student thinking, discourse, and learning processes. Recent work has increasingly adopted LLM-as-a-judge approaches, which replace traditional supervised learning with rubric-based prompting of large language models. While these approaches substantially reduce development effort, they are often deployed without the forms of rigor historically emphasized in EDM and related communities. At the same time, fully rigorous detector development remains costly and slow, creating pressure to adopt lower-quality shortcuts. In this paper, we propose a Taylorizable process for developing textual detectors of cognitive constructs using LLMs—attempting to reconcile methodological rigor with efficiency and scalability. Drawing on prior detector development practices and principles of scientific management, we decompose detector construction into a standardized, end-to-end workflow encompassing construct definition, human coding and reliability checking, LLM-based detector construction, evaluation, and application at scale. We argue that this process supports four essential forms of rigor—assessment of good-ness, generalizability, construct validity, and auditability—while enabling streamlined execution by mixed-expertise teams. By making rigorous detector development more reproducible, auditable, and efficient, this framework aims to raise methodolog-ical standards for LLM-based measurement while preserving the practical advantages, including feasibility and lower development effort, that have driven their rapid adoption.

Version published to 10.35542/osf.io/6h387_v1 on OSF Preprints
Apr 8, 2026

Bridging Developer–QA Gaps Using Large Language Models and Automation: A Pilot Evaluation of AutoVisQA

This article has 1 author:
1. Tanvir Hasan
This article has no evaluationsLatest version Apr 17, 2026
AI-Assisted Test Scope Recommendation for Manual QA: A Framework and Evaluation

This article has 1 author:
1. Arbaz Surti
This article has no evaluationsLatest version Apr 8, 2026
From Product to Process: A Framework and Practical Toolkit for AI-Aware University Assessment

This article has 3 authors:
1. Luis F. Rivera-Galicia
2. Mónica Giménez-Baldazo
3. Carlos Mir-Fernández
This article has no evaluationsLatest version Apr 15, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Bridging Developer–QA Gaps Using Large Language Models and Automation: A Pilot Evaluation of AutoVisQA

AI-Assisted Test Scope Recommendation for Manual QA: A Framework and Evaluation

From Product to Process: A Framework and Practical Toolkit for AI-Aware University Assessment