Multi-Interaction Modeling with Intelligent Coordination for Multimodal Emotion Recognition

Cambria Ellis
Wyne Nasir
Linden Porter

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Emotion recognition through multimodal signals—such as speech, text, and facial cues—has garnered increasing attention due to its pivotal role in enhancing human-computer interaction and intelligent communication systems. However, existing approaches often struggle to thoroughly capture the intricacies of multimodal interactions, primarily due to the challenges in effectively fusing heterogeneous modalities while mitigating redundancy and preserving complementary information. In this study, we introduce \textbf{MIMIC}, a novel framework designed to comprehensively model complex multimodal interactions from diverse perspectives. Specifically, MIMIC introduces three parallel latent representations: a modality-preserving full interaction representation, a cross-modal shared interaction representation, and individualized modality-specific representations. Furthermore, a hierarchical semantic-driven fusion strategy is proposed to seamlessly integrate these representations into a cohesive multimodal interaction space. Extensive experiments demonstrate that our MIMIC framework not only surpasses prior state-of-the-art methods but also achieves this with remarkable efficiency, involving lower computational complexity and significantly fewer trainable parameters. Our contributions are twofold: (1) advancing a multi-perspective interaction modeling approach that enhances the depth of multimodal emotion analysis, and (2) offering a streamlined, resource-efficient framework suitable for practical deployments in emotion-aware systems.

Version published to 10.20944/preprints202505.1219.v1
May 16, 2025

Multimodal Vision Language Models in Interactive and Physical Environments

This article has 4 authors:
1. Lucas Pereira
2. Martina Kovács
3. Ahmed El-Masry
4. Feidlimid Shyama
This article has no evaluationsLatest version Dec 26, 2025
Lingo-Aura: A Cognitive-Informed and Numerically Robust Multimodal Framework for Predictive Affective Computing in Clinical Diagnostics

This article has 3 authors:
1. Lianghao Tan
2. Yongjia Song
3. Ziyan Wen
This article has no evaluationsLatest version Jan 9, 2026
A Modality-Aware Graph-Structured and Symmetry-Aware Multi-Task Learning Framework for Joint Emotion Recognition and Immersion Estimation in Virtual Reality

This article has 2 authors:
1. Haibing Wang
2. Mu-Jiang-Shan Wang
This article has no evaluationsLatest version Jan 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multimodal Vision Language Models in Interactive and Physical Environments

Lingo-Aura: A Cognitive-Informed and Numerically Robust Multimodal Framework for Predictive Affective Computing in Clinical Diagnostics

A Modality-Aware Graph-Structured and Symmetry-Aware Multi-Task Learning Framework for Joint Emotion Recognition and Immersion Estimation in Virtual Reality