WLAM Attention: Plug-and-Play Wavelet Transform Linear Attention

Bo Feng
Chao Xu
Zhengping Li
Shaohua Liu

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Linear attention has gained popularity in recent years due to its lower computational complexity compared to Softmax attention. However, its relatively lower performance has limited its widespread application. To address this issue, we propose a plug-and-play module called the Wavelet-Enhanced Linear Attention Mechanism (WLAM), which integrates a discrete wavelet transform (DWT) with linear attention. This approach enhances the model’s ability to express global contextual information while improving the capture of local features. Firstly, we introduce the DWT into the attention mechanism to decompose the input features. The original input features are utilized to generate the query vector Q, while the low-frequency coefficients are used to generate the key K. The high-frequency coefficients undergo convolution to produce the value V. This method effectively embeds global and local information into different components of the attention mechanism, thereby enhancing the model’s perception of details and overall structure. Secondly, we perform multi-scale convolution on the high-frequency wavelet coefficients and incorporate a Squeeze-and-Excitation (SE) module to enhance feature selectivity. Subsequently, we utilize the inverse discrete wavelet transform (IDWT) to reintegrate the multi-scale processed information back into the spatial domain, addressing the limitations of linear attention in handling multi-scale and local information. Finally, inspired by certain structures of the Mamba network, we introduce a forget gate and an improved block design into the linear attention framework, inheriting the core advantages of the Mamba architecture. Following a similar rationale, we leverage the lossless downsampling property of wavelet transforms to combine the downsampling module with the attention module, resulting in the Wavelet Downsampling Attention (WDSA) module. This integration reduces the network size and computational load while mitigating information loss associated with downsampling. We apply the Wavelet-Enhanced Linear Attention Mechanism (WLAM) to classical networks such as PVT, Swin, and CSwin, achieving significant improvements in performance on image classification tasks. Furthermore, we combine wavelet linear attention with the Wavelet Downsampling Attention (WDSA) module to construct WDLMFormer, which achieves an accuracy of 84.2% on the ImageNet-1K dataset.

Version published to 10.3390/electronics14071246
Mar 21, 2025
Version published to 10.20944/preprints202502.2130.v1
Feb 26, 2025

Partial Convolution Meets Visual Attention

This article has 8 authors:
1. Haiduo Huang
2. Fuwei Yang
3. Dong Li
4. Ji Liu
5. Lu Tian
6. Jinzhang Peng
7. Pengju Ren
8. Emad Barsoum
This article has no evaluationsLatest version Mar 21, 2025
Enhanced Video Summarization Using BiLSTM Encoder-Decoder with Dual Attention and Particle Swarm Optimization

This article has 2 authors:
1. Ramniwas Lodhi
2. Ranvijay Ranvijay
This article has no evaluationsLatest version Apr 21, 2025
A lightweight MobileViT with Linear Differential Attention for micro-expression recognition

This article has 3 authors:
1. Haiquan Wang
2. Kunxia Wang
3. Wancheng Yu
This article has no evaluationsLatest version Apr 21, 2025

Listed in

Abstract

Article activity feed

Related articles

Partial Convolution Meets Visual Attention

Enhanced Video Summarization Using BiLSTM Encoder-Decoder with Dual Attention and Particle Swarm Optimization

A lightweight MobileViT with Linear Differential Attention for micro-expression recognition