Not One Size Fits All: The Selective Effect of Reasoning Processes for Multimodal Body Language Emotion Recognition in GAI

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: The rapid integration of Generative AI (GAI) into mental health contexts necessitates a rigorous evaluation of its emotion recognition capabilities.Objective: This study investigated whether the integration of reasoning processes via specialized GAI architectures or Chain-of-Thought (CoT) prompting strategies translate to improved recognition of non-verbal emotion from body language.Methods: Four models from the Gemini family (Pro 2.0, Pro 1.5, Flash 2.0, and FlashThinking) were evaluated using the EU-Emotion Stimulus Set under Zero-Shot and CoT conditions.Results: The specialized reasoning model (FlashThinking) achieved a Chance-Corrected Recognition (CCR) score of 0.463, 95% CI [0.310, 0.621], failing to outperform generalist architectures. Instead, the high-capacity generalist model (Gemini Pro 2.0) exhibited the highest performance (CCR = 0.722, 95% CI [0.590, 0.838]), significantly outperforming all other models (OR ≤ 0.19, p < .001) and achieving parity with human benchmarks (CCR = 0.717). Interaction analyses revealed divergent effects: CoT prompting significantly enhanced Gemini Pro 2.0 (OR = 3.97, p < .001) and mitigated its “positivity bias,” whereas it significantly impaired Gemini Flash 2.0 (OR = 0.40, p < .001), creating a specific blind spot for negative emotions.Conclusion: We conclude that CoT is not a universal facilitator for emotion recognition; rather, its effectiveness is contingent on model capacity, and its misapplication to smaller models may compromise clinical safety.

Article activity feed