An Improved Convolution Neural Network and Variational Autoencoder Model with Attention Modules for Enhanced Facial Expression Recognition
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Facial Expression Recognition (FER) is a crucial component in artificial intelligence, with applications in fields such as psychological studies, driver stress monitoring, interactive game design, and mobile learning. Recent advancements in feature extraction techniques, including the use of Convolutional Neural Networks (CNNs), have shown significant promise in enhancing FER systems. This paper proposes a novel FER method that combines CNNs, Channel Attention Enhancement Modules (CAEM), Spatial Attention Enhancement Modules (SAEM), and Variational Autoencoders (VAEs) to improve feature extraction and classification accuracy. The proposed approach first utilizes a CNN to extract hierarchical feature maps from input images, which are further refined using attention mechanisms. The CAEM applies channel-wise attention to emphasize critical features, while the SAEM focuses on spatially significant regions of the feature maps. A VAE is integrated to learn compact, low-dimensional latent representations of the feature maps, leveraging the reparameterization trick for efficient sampling and reconstruction. The outputs from the CAEM, SAEM, and VAE are fused via element-wise addition to produce an enhanced feature representation, which is passed through fully connected layers with a SoftMax activation for classification. The network is trained using a hybrid loss function, combining a classification loss and a VAE reconstruction loss to balance classification accuracy and representation quality. The model's performance was evaluated on three benchmark datasets: the FER Challenge (FER-2013), the extended Cohn-Kanade (CK+), and the Japanese Female Facial Expression (JAFFE) datasets. The proposed method achieved accuracy rates of 91.00% on FER2013, 97.00% on JAFFE, and 98.80% on CK+, surpassing state-of-the-art FER methods. These results demonstrate the effectiveness of the CNN-VAE model, particularly its ability to leverage attention mechanisms and variational autoencoding for robust facial expression recognition.