Personality Recognition Models Based on Visual Cues: Exploring the Intrinsic Links between PAD Dynamics and Personality Traits

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Understanding human personality from nonverbal behavior is a longstanding challenge in psychology and artificial intelligence. This study presents a video-based, automated framework for personality recognition that integrates dynamic visual cues with the PAD (Pleasure-Arousal-Dominance) emotional state model. We introduce the Cross-Modal Attention Vision Transformer (CMA-ViT), a dual-stream, multi-task learning model that fuses raw video frames with pre-extracted features, including facial action units, head motion, gaze, and frame-by-frame PAD values. The model captures temporal dynamics in emotional expression, head motion, and gaze patterns to infer the Big Five personality traits.Experiments on the MDPE dataset demonstrate robust performance, with an average classification accuracy of 71.7%, highest for Neuroticism (90.1%) and lowest for Openness (57.4%), suggesting that not all of the personality traits are explicitly expressed in observable behaviors. Gradient-weighted feature importance analysis revealed that PAD emotional features, gaze patterns, and head-related cues are the primary contributors, while facial action units introduced noise in this dataset. Temporal analysis of PAD fluctuations further indicated that indices such as variability, frequency, intensity, and transition rate provide trait-relevant signals, supporting the notion that personality is reflected not only in average states but also in dynamic patterns of emotional change.These findings have methodological and theoretical implications: they highlight the value of integrating multi-dimensional information---such as temporal emotional dynamics, head motion, and gaze---for accurate personality recognition, challenge assumptions about the predictive role of facial actions, and empirically support dynamic models of personality such as Fleeson's Density Distribution Theory. This work provides a novel, interpretable framework for video-based personality computing, advancing both the accuracy and theoretical grounding of automated trait inference.

Article activity feed