Generative Psychometrics via AI-GENIE: Automatic Item Generation with Network-Integrated Evaluation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rapid advancement of artificial intelligence (AI), particularly large language models (LLMs), has introduced powerful tools for various research domains, including psychological scale development. This study presents a methodology for efficiently generating and selecting high-quality, non-redundant items for psychological assessments using LLMs and network psychometrics. Our approach, termed Automatic Item Generation with Network-Integrated Evaluation (AI-GENIE), reduces reliance on expert intervention by integrating generative AI with the latest network psychometric techniques. The efficacy of AI-GENIE was evaluated through Monte Carlo simulations using the Mixtral, Gemma 2, Llama 3, GPT 3.5, and GPT 4o models to generate item pools that mimic Big Five personality assessments. Additionally, items from AI-GENIE were empirically tested with five nationally representative U.S. samples (N = 4,964 total), demonstrating that AI-GENIE-generated scales achieve structural validity comparable to traditional expert-developed measures. The results demonstrated improvements in item selection efficiency, with overall average increases of 8.68-20.03 in normalized mutual information in the final item pool across all models. We also present a simulation study on the emerging construct of AI Anxiety to demonstrate AI-GENIE’s utility for underrepresented constructs. Results from newly released models (DeepSeek, GPT-OSS 20B, GPT-OSS 120B) are presented in the Appendix. The findings of our paper suggest that AI-GENIE is a highly effective tool for streamlining the scale development and validation process.