SPSTNet: Image Super-Resolution Using Spatial Pyramid Swin Transformer Network

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Recent research on enhancing image resolution using convolutional neural networks (CNNs) have shown encouraging outcomes. While due to the intrinsic locality of the convolution operator, CNN-based methods limit the capacity to obtain contextual information and long-range dependency. To address this problem, we propose a hybrid network by integrating CNN and Transformer which show impressive performance to learn long-range contextual information for image SR. Specifically, by introducing a spatial pyramid pooling (SPP) module into the multi-head attention (MHA), the Spatial Pyramid Swin Transformer (SPST) module achieves linear computational complexity and integrates multi-scale features. This enables the model to learn a wider range of multi-scale features and enhances the capabilities of the attention matrix. Moreover, the gated convolution (GC) module employs the abundant low-frequency information from low-resolution to assist reconstruction and provides a learnable dynamic feature selection mechanism to further constrain the training to improve performance. Extensive experiments were carried out to assess the efficacy of our approach utilizing a benchmark dataset. The results of indicate that our method surpasses alternative approaches in terms of parameter count and computational efficiency. Especially, the proposed method increases PSNR by 0.05 dB and uses 1.6M fewer parameters than SwinIR, resulting in a shorter inference time.

Article activity feed