Not One Size Fits All: The Selective Effect of Reasoning Processes for Multimodal Body Language Emotion Recognition in GAI
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: The rapid integration of Generative AI (GAI) into mental health contexts necessitates a rigorous evaluation of its emotion recognition capabilities.Objective: This study investigated whether the integration of reasoning processes via specialized GAI architectures or Chain-of-Thought (CoT) prompting strategies translate to improved recognition of non-verbal emotion from body language.Methods: Four models from the Gemini family (Pro 2.0, Pro 1.5, Flash 2.0, and FlashThinking) were evaluated using the EU-Emotion Stimulus Set under Zero-Shot and CoT conditions.Results: The specialized reasoning model (FlashThinking) achieved a Chance-Corrected Recognition (CCR) score of 0.463, 95% CI [0.310, 0.621], failing to outperform generalist architectures. Instead, the high-capacity generalist model (Gemini Pro 2.0) exhibited the highest performance (CCR = 0.722, 95% CI [0.590, 0.838]), significantly outperforming all other models (OR ≤ 0.19, p < .001) and achieving parity with human benchmarks (CCR = 0.717). Interaction analyses revealed divergent effects: CoT prompting significantly enhanced Gemini Pro 2.0 (OR = 3.97, p < .001) and mitigated its “positivity bias,” whereas it significantly impaired Gemini Flash 2.0 (OR = 0.40, p < .001), creating a specific blind spot for negative emotions.Conclusion: We conclude that CoT is not a universal facilitator for emotion recognition; rather, its effectiveness is contingent on model capacity, and its misapplication to smaller models may compromise clinical safety.