Multi-Interaction Modeling with Intelligent Coordination for Multimodal Emotion Recognition

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Emotion recognition through multimodal signals—such as speech, text, and facial cues—has garnered increasing attention due to its pivotal role in enhancing human-computer interaction and intelligent communication systems. However, existing approaches often struggle to thoroughly capture the intricacies of multimodal interactions, primarily due to the challenges in effectively fusing heterogeneous modalities while mitigating redundancy and preserving complementary information. In this study, we introduce \textbf{MIMIC}, a novel framework designed to comprehensively model complex multimodal interactions from diverse perspectives. Specifically, MIMIC introduces three parallel latent representations: a modality-preserving full interaction representation, a cross-modal shared interaction representation, and individualized modality-specific representations. Furthermore, a hierarchical semantic-driven fusion strategy is proposed to seamlessly integrate these representations into a cohesive multimodal interaction space. Extensive experiments demonstrate that our MIMIC framework not only surpasses prior state-of-the-art methods but also achieves this with remarkable efficiency, involving lower computational complexity and significantly fewer trainable parameters. Our contributions are twofold: (1) advancing a multi-perspective interaction modeling approach that enhances the depth of multimodal emotion analysis, and (2) offering a streamlined, resource-efficient framework suitable for practical deployments in emotion-aware systems.

Article activity feed