Evaluating Personality Traits of Large Language Models Through Scenario-based Interpretive Benchmarking

Alessandro Berti

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The assessment of Large Language Models (LLMs) has traditionally focused on performance metrics tied directly to their task-solving capabilities. This paper introduces a novel benchmark explicitly designed to measure personality traits in LLMs through scenario-based interpretive prompts. We detail the methodology behind this benchmark, where LLMs are presented with structured prompts inspired by psychological scenarios, and responses are assessed via a judge LLM. The evaluation encompasses traits such as emotional stability, creativity, adaptability, and anxiety levels, among others. Scores are assigned based on a judge LLM’s evaluation, with consistency across various judge models assessed through consensus analysis. Anecdotal observations on score validity and orthogonality with conventional performance metrics are discussed. Results, implementation scripts, and updated leaderboards are publicly accessible at https://github.com/fit-alessandro-berti/llm-dreams-benchmark

Version published to 10.31234/osf.io/743vc_v1 on OSF Preprints
Apr 5, 2025

Evaluating Personality Traits of Large Language Models Through Scenario-Based Interpretive Benchmarking

This article has 1 author:
1. Alessandro Berti
This article has no evaluationsLatest version Apr 8, 2025
Evaluating Personality Traits of Large Language Models Through Scenario-based Interpretive Benchmarking

This article has 1 author:
1. Alessandro Berti
This article has no evaluationsLatest version Apr 9, 2025
Big Five Personality Trait Prediction Based on User Comments

This article has 3 authors:
1. Kit-May Shum
2. Michal Ptaszynski
3. Fumito Masui
This article has no evaluationsLatest version May 20, 2025

Listed in

Abstract

Article activity feed

Related articles

Evaluating Personality Traits of Large Language Models Through Scenario-Based Interpretive Benchmarking

Evaluating Personality Traits of Large Language Models Through Scenario-based Interpretive Benchmarking

Big Five Personality Trait Prediction Based on User Comments