Facial Emotion Recognition Based on ResNet18 with Multi-Dimensional Attention Mechanisms
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Emotion, as a fundamental characteristic of humans, is the most important non-verbal way of expressing inner feelings and intentions, playing a crucial role in communication. Although various deep learning frameworks have been applied to the field of emotion recognition, facial images contain rich emotional features in the eyebrows, mouth corners, eyes, as well as changes in skin tone, light-shadow contrast, and muscle tension distribution. How to effectively characterize these emotional features from multiple dimensions remains a significant challenge in facial emotion recognition. This study proposes an enhanced ResNet18 architecture incorporating three specialized attention mechanisms: (1) channel-wise attention for feature refinement, (2) spatial attention for regional emphasis, and (3) multi-scale attention for hierarchical feature fusion. This synergistic design enables comprehensive integration of features across global contexts, local details, and varying granularities, significantly improving facial emotion recognition accuracy. Our model was evaluated on the DEAP dataset for classification experiments based on arousal and valence. The binary classification accuracy for valence and arousal reached 99.21% and 99.20%, respectively, while the accuracy for four-class emotion recognition was 97.45%. Experimental results demonstrate that our proposed method can effectively extract multi-dimensional features from facial expressions and improve the accuracy and robustness of emotion recognition. Our approach provides innovative feature extraction techniques and a theoretical foundation for emotion recognition based on facial images, offering significant reference value for enhancing recognition accuracy.