Architectural Transparency in LLM-Based Cognitive Assessment: A Multidimensional TRACE-ED Evaluation of Single-Agent and Multi-Agent Systems
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Purpose: The rapid integration of Large Language Models (LLMs) into educational assessment systems has intensified the need for rigorous transparency validation beyond accuracy and score agreement. While prior research primarily evaluates automated grading systems through reliability and human–AI alignment metrics, limited attention has been given to how architectural design influences transparency properties. This study proposes TRACE-ED, a multidimensional transparency framework that operationalizes transparency across five dimensions: Reliability (R), Alignment (A), Claim–Evidence Grounding (C), Explanation Coherence (E), and Disclosure/Auditability (D). Transparency is conceptualized as an architectural vector rather than a scalar attribute. Methods: Using a Design Science Research methodology and a Monte Carlo experimental design, this study compares single-agent and multi-agent LLM architectures across 10 synthetic short-answer responses, three rubric indicators, multiple temperature settings, and 15 repetitions per condition (total runs = 900). Reliability is evaluated using Intraclass Correlation Coefficient ICC(1,k), while grounding and coherence are measured through semantic similarity thresholds and polarity alignment metrics. Statistical comparisons employ Welch’s t-test and Cohen’s d effect sizes. Results: Results demonstrate that architectural modularization preserves high reliability (Multi-Agent ICC(1,k) = 0.9921) while significantly increasing semantic grounding (GR = 0.763; d = 3.72) and explanation coherence (CS = 0.782; d = 7.90). A minor increase in contradiction rate (CR = 0.031; d = 0.43) indicates limited coordination trade-offs. These findings confirm that transparency is multidimensional and architecture-sensitive. Multi-agent systems redistribute transparency components rather than uniformly maximizing them. Conclusion: This study advances explainable AI research in education by formalizing transparency as a measurable architectural vector, introducing Monte Carlo–based stochastic validation for LLM assessment systems, and providing empirical evidence of transparency trade-offs in modular AI design. The TRACE-ED framework offers a scalable and auditable methodology for evaluating AI-driven grading systems in high-stakes educational contexts.