Multimodal large language models converge on the human-like geometry of abstract emotion

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Understanding whether artificial intelligence (AI) systems represent abstract concepts in a human-like manner is pivotal for developing trustworthy AI. While recent work has aligned model representations with human concrete visual object concepts, it remains unclear whether such alignment extends to the subjective and context-dependent domain of emotion. Here, we investigate the emergent affective geometry in large language models (LLMs) and multimodal LLMs (MLLMs) through a large-scale, unsupervised ``machine-behavioral'' paradigm. By deriving 30-dimensional embeddings from over 12 million triplet odd-one-out judgments on 2,180 emotionally evocative videos, we reveal a sophisticated ``hybrid'' geometry. This structure synthesizes categorical clusters with continuous dimensions, showing strong selective correlations with human ratings across 34 emotion categories and 14 affective dimensions, effectively reconciling the long-standing category-versus-dimension debate in affective science. To demonstrate the operational utility of these representations, we introduce a generative editing framework, showing that manipulating specific affective components actively steers generated video content in a predictable, human-interpretable manner. Crucially, at the neural level, the MLLM-derived affective space predicts human fMRI activity in high-level social-emotional regions (e.g., temporoparietal junction) with accuracy matching or exceeding traditional human self-report ratings. These findings demonstrate that MLLMs converge on a biologically plausible, brain-aligned representational scheme for abstract emotion, distinguishing them from models of pure visual perception and establishing a framework for artificial social intelligence.

Article activity feed