Research on a Multimodal Emotion Perception Model Based on GCN+GIN Hybrid Model

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Graph neural networks (GNNs) have demonstrated strong performance in handling graph-structured data in recent years​​, particularly in capturing complex inter-node relationships among data samples, showcasing advantages over traditional neural networks. However, challenges persist, including ​​difficulties in cross-modal information fusion, inadequate modeling of modal relationships, and high computational costs​​. To address these limitations, ​​this paper proposes GGMEN​​, a novel model that integrates the local neighborhood aggregation capability of graph convolutional networks with the global structural expressiveness of graph isomorphic networks (GINs). Leveraging ​​shallow feature extraction via time-frequency joint analysis​​, the paper extracts 14 representative physiological statistical features. Simultaneously, the ​​Transformer model captures spatial features from individual facial expression video frames​​, enabling spatio-temporal modeling of facial expressions. ​​The GCN layer models temporal dependencies in physiological signals and spatial relationships of facial key points​​, while the ​​GIN layer enhances modeling of complex higher-order relationships​​. ​​Multi-modal emotion perception is achieved through attention-based modality fusion​​. Experiments on the DEAP dataset validate the model’s effectiveness across multiple emotion perception benchmarks, achieving an emotion recognition accuracy of 81.25%. Comparative analyses with existing models confirm the accuracy improvement of the proposed framework.

Article activity feed