Personality Auto-Scoring with Large Language Models Using a Realistic Accuracy Model of Behavioral Cues in Chatbot Interviews

Zihan Yan
Ashley Sylvara
Pengda Wang
Tianjun Sun
Ziang Xiao

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Advances in artificial intelligence, particularly large language models (LLMs), have opened new possibilities for automating personality assessment through text-based chatbot interviews. While prior research has applied machine learning (ML) and natural language processing (NLP) methods to score interview responses, these approaches often lack a strong theoretical foundation for extracting and interpreting trait-relevant behavioral cues. In this study, we integrate Funder’s Realistic Accuracy Model (RAM) into LLM-based autoscoring to enhance the identification and utilization of behavioral cues in personality evaluation. We use two archival samples (N = 521) to examine the alignment between LLM-derived personality scores and established measures, including human-coded behavioral description and narrative interview ratings, as well as self-reported Big Five personality assessments. We compare results from a job-focused behavioral interview (Sample 1; N = 218) and narrative identity interviews (Sample 2; N = 303). In the behavioral interview sample, RAM-based LLM prompts demonstrated stronger convergence with human ratings than zero-shot prompts. However, in the narrative interview sample, this advantage was attenuated, with RAM- and zero-shot-based LLM scores showing similar convergence. In Study 2, we analyzed differences in the behavioral cues extracted by LLMs and human raters to better understand the rating reasoning process. Similarity analyses revealed moderate overlap between LLM-extracted and human-annotated cues. These findings suggest that theory-guided LLMs can identify behavioral cues that partially overlap with those used by humans. Limitations and implications for scalable, accurate, and interpretable AI-based personality assessment are discussed.

Version published to 10.31234/osf.io/rtsm5_v1 on OSF Preprints
Dec 26, 2025

Assessing Personality Using Zero-Shot Generative AI Scoring of Brief Open-Ended Text

This article has 7 authors:
1. Aidan G.C. Wright
2. Whitney R. Ringwald
3. Colin Vize
4. johannes Christopher Eichstaedt
5. Mike Angstadt
6. Aman Taxali
7. Chandra Sripada
This article has no evaluationsLatest version Feb 3, 2026
Who is most likely to accept AI chatbots? A sequential explanatory mixed-methods study of personality and ChatGPT acceptance for language learning.

This article has 6 authors:
1. Changrong Du
2. Mi Tang
3. Bin Zou
4. ChenghaoWang
5. Yinan Xia
6. Yiran Du
This article has no evaluationsLatest version Jan 22, 2026
A comparison of supervised machine learning models and large language models in predicting personality traits and cognitive ability from asynchronous video interviews

This article has 1 author:
1. Antonis Koutsoumpis
This article has no evaluationsLatest version Jan 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Assessing Personality Using Zero-Shot Generative AI Scoring of Brief Open-Ended Text

Who is most likely to accept AI chatbots? A sequential explanatory mixed-methods study of personality and ChatGPT acceptance for language learning.

A comparison of supervised machine learning models and large language models in predicting personality traits and cognitive ability from asynchronous video interviews