Psychometrics is all you need
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper argues that the current state of AI development and evaluation should be re-imagined within a psychometric validity framework long used by the measurement and assessment community. It frames the problem of evaluating AI using the same principled structure applied to human trait evaluation. The paper presents an evidence-centered, psychometric framework that guides generative AI model applications from construct-driven design through evaluation and validation. While the literature has suggested application of psychometric methods and tests to certain components of AI evaluation, a broader, overarching approach has not been comprehensively connected to AI development and evaluation, nor have these ideas gained widespread adoption. I explicitly connect evidence-centered design models to the generative AI application pipeline to optimize alignment between intended claims, model outputs, and their evaluation. To inform evaluation plans so they support arguments for the use of AI system outputs, I propose collecting validity evidence and use a case study to demonstrate this paradigm. This paper also provides guiding questions for developers and evaluators and highlights how psychometric methods might be used for AI.