Emotion-BIND: Multimodal Emotion Recognition and Reasoning in Conversation

Qunli Tang
Qi Wang
Shouhao Zhang
Dongliang Huang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Constructing a Multimodal Emotion Recognition in Conversation (MERC) model is crucial for understanding users’ emotions. Current methods often use linear layers for cross-modal feature alignment, keeping encoders frozen, which can result in feature loss and alignment inaccuracies. Additionally, traditional 1D positional encoding for dynamic content like videos limits the capture of important information.To tackle these issues, we propose a new approach that integrates cross-modal data feature extraction without relying on linear transformations, enabling features to exist in a unified vector space. This significantly improves alignment precision and reduces feature loss. For dynamic content, we introduce the m-ROPE technique, which breaks down positional encoding into three dimensionstime, height, and widthenhancing the models spatial understanding in text, images, and videos. By lowering position ID values for images and videos, we enable better extrapolation during inference for longer sequences.Experimental results demonstrate that our model achieves an unweighted average recall (UAR) of 49.44% and a weighted average recall (WAR) of 71.02% on the DFEM dataset, outperforming others. On the MELD dataset, it reaches a WAR of 63.88%, exceeding the second-best by 4.67%, and excels in recognizing complex emotions like fear and surprise.

Version published to 10.21203/rs.3.rs-7036007/v1 on Research Square
Aug 4, 2025

LPGNet: A Lightweight Network with Parallel Attention and Gated Fusion for Multimodal Emotion Recognition

This article has 2 authors:
1. Zhining He
2. Yang Xiao
This article has no evaluationsLatest version Aug 11, 2025
LGFN: A Dynamic Gating Framework for Lyrics-Audio Alignment in Music Emotion Recognition

This article has 5 authors:
1. Yinghui Wang
2. Mengqi Zang
3. Huasen Zhang
4. Xin Shan
5. Xinyue Wang
This article has no evaluationsLatest version Aug 15, 2025
DCA-CL: Enhancing Multimodal Emotion Recognition via Dual Cross Attention and Contrastive Learning

This article has 5 authors:
1. Xin Wang
2. Shubo Liu
3. Hongshe Dang
4. Longlong Qiao
5. Hongnian Yu
This article has no evaluationsLatest version Aug 27, 2025

Listed in

Abstract

Article activity feed

Related articles

LPGNet: A Lightweight Network with Parallel Attention and Gated Fusion for Multimodal Emotion Recognition

LGFN: A Dynamic Gating Framework for Lyrics-Audio Alignment in Music Emotion Recognition

DCA-CL: Enhancing Multimodal Emotion Recognition via Dual Cross Attention and Contrastive Learning