GBV-Net: Hierarchical Fusion of Facial Expressions and Physiological Signals for Multimodal Emotion Recognition

Jiling Yu
Yandong Ru
Bangjun Lei
Hongming Chen

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

A core challenge in multimodal emotion recognition lies in the precise capture of the inherent multimodal interactive nature of human emotions. Addressing the limitation of existing methods, which often process visual signals (facial expressions) and physiological signals (EEG, ECG, GSR) in isolation and thus fail to exploit their complementary strengths effectively, this paper presents a new multimodal emotion recognition framework called the Gated Biological Visual Network (GBV-Net). This framework enhances emotion recognition accuracy through deep synergistic fusion of facial expressions and physiological signals. GBV-Net integrates three core modules: (1) A facial feature extractor based on a modified ConvNeXt V2 architecture incorporating lightweight Transformers, specifically designed to capture subtle spatio-temporal dynamics in facial expressions; (2) A hybrid physiological feature extractor combining 1D convolutions, Temporal Convolutional Networks (TCN), and convolutional self-attention mechanisms, adept at modeling local patterns and long-range temporal dependencies in physiological signals; (3) An enhanced gated attention fusion module capable of adaptively learning inter-modal weights to achieve dynamic, synergistic integration at the feature level. A thorough investigation of the publicly accessible DEAP and MAHNOB-HCI datasets reveals that GBV-Net surpasses contemporary methods. Specifically, on the DEAP dataset, the model attained classification accuracies of 94.68% for Valence and 95.93% for Arousal. On MAHNOB-HCI, the accuracies achieved were 97.48% for Valence and 97.78% for Arousal. These experimental findings substantiate that GBV-Net effectively captures deep-level interactive information between multimodal signals, thereby improving emotion recognition accuracy.

Version published to 10.20944/preprints202509.0094.v1
Sep 2, 2025

Embodied AI: Multimodal Integration of Facial Expressions and Biometric Signals

This article has 1 author:
1. Thabo Mosala
This article has no evaluationsLatest version Sep 3, 2025
Interpretable Multimodal Emotion Recognition in Counseling Dialogues via Factor Analysis and Gaussian Mixture Modeling

This article has 1 author:
1. Keita Kiuchi
This article has no evaluationsLatest version Aug 24, 2025
DCA-CL: Enhancing Multimodal Emotion Recognition via Dual Cross Attention and Contrastive Learning

This article has 5 authors:
1. Xin Wang
2. Shubo Liu
3. Hongshe Dang
4. Longlong Qiao
5. Hongnian Yu
This article has no evaluationsLatest version Aug 27, 2025

Listed in

Abstract

Article activity feed

Related articles

Embodied AI: Multimodal Integration of Facial Expressions and Biometric Signals

Interpretable Multimodal Emotion Recognition in Counseling Dialogues via Factor Analysis and Gaussian Mixture Modeling

DCA-CL: Enhancing Multimodal Emotion Recognition via Dual Cross Attention and Contrastive Learning