Navigating the Maze of Measurement: Large Language Models for objective instrument selection

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: The proliferation of psychological measures and constructs has led to critical conceptual fragmentation, complicating instrument selection and undermining content validity. Traditional expert-based procedures are often resource-intensive and subjective, underscoring the need for objective, scalable assessment methods.Aims and Methods: This study evaluates the capability of Large Language Models—comparing embedding-based semantic similarity and prompt-based generative approaches—to perform scalable content assessments. The methodology was validated across 13 scales for Internet Gaming Disorder (IGD) and applied to 7 established measures of depression, benchmarking results against theoretical criteria and expert consensus.Results: Models demonstrated strong capability across two key tasks. Generative models achieved high classification accuracy (Cohen's κ up to 0.90) in mapping scale items to their theoretically intended symptoms. Furthermore, aggregated semantic similarity derived from embedding models strongly correlated with overall expert rankings of scales’ content validity (ρ = 0.89), validating their use for objective instrument triage. Importantly, the open-source model (intfloat/e5-large-v2) successfully replicated expert consensus in the depression application. However, models struggled to reliably replicate fine-grained symptom-level quality assessments.Conclusion: Both embedding and generative models offer a powerful, scalable, and theory-referenced heuristic for psychometric triage. By providing an objective and cost-effective methodology to rank item-construct alignment, this approach helps researchers efficiently select the measurement tools with the highest content validity with respect to the predefined construct.

Article activity feed