A lightweight MobileViT with Linear Differential Attention for micro-expression recognition

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Extracting micro-expression image features using Transformer-based models is a common strategy. However, attention noise may cause the model to focus on irrelevant information. In addition, the complexity and resource consumption of the Transformer model increases significantly as the number of input tokens entered. To solve this problem, this paper proposes a Linear Differential Attention (LDA) to reduce the computation and attention noise of the MobileViT model. Firstly, We modified the self-attention computation by using piecewise functions and Gaussian kernel functions, thus reducing its complexity to linear. In this way, we obtain Linear Attention(LA). Then, we construct a pair of linear attention and use the difference between them to compute the attention score, which enhances the model's attention to key information. Finally, We use LDA to replace the Multi-Head Self-Attention in the MobileViT Block to achieve lightweight. The experimental results show that the improved MobileViT model reached 85.48% on CASME II and 76.5% on SAMM, respectively, using only 0.899G floating point operations (FLOPs) and 4.95M parameters. This demonstrates the effectiveness of our improvements.

Article activity feed