Transfer Learning Through Adaptive Fine-Tuning and Attention Mechanism Framework for Facial Expression Recognition in the Elderly

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

With the advancement of Convolutional Neural Networks (CNNs), facial expression recognition (FER) has become one of the most active research topics, achieving promising results in computer vision and pattern recognition. However, most existing approaches do not consider the effect of facial attributes, such as age-related changes in facial expressions,and instead focus primarily on younger faces, leaving a gap in research on older individuals. The elderly population, especially those living alone, is rapidly increasing worldwide, emphasizing the need for emotionally intelligent devices in elderly care. The main objective of this study is to fill this gap by evaluating and improving pre-trained models using transfer learning, fine-tuning, and integration with an attention mechanism. We fine-tune pre-trained models to transfer knowledge to the target domain of elderly facial expressions. Additionally, we incorporate a squeeze-and-excitation (SE) module, an attention mechanism, to enhance the CNN’s performance in terms of both convergence speed and classification accuracy. The primary innovation of this model lies in integrating the SE block within the depthwise separable convolution layers.This integration improves the model’s ability to focus on discriminative features, and enhancing performance in elderly facial expression recognition by addressing age-related confounding factors such as wrinkles and facial component deformation.E xperimental results indicate that the MobileNet model with SE outperforms state-of-the-art methods, achieving an accuracy of 95.47% on the FACE dataset. An ablation study, along with t-distributed stochastic neighbor embedding (t-SNE) and Gradient-weighted Class Activation Mapping (Grad-CAM) visualizations, demonstrates that the proposed method offers improved learning of elderly facial expression representations.

Article activity feed