Lightweight Super-Resolution Reconstruction Architecture of Remote Sensing Images Using a Residual Hierarchical Transformer Network

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Remote Sensing image super-resolution technology aims to enhance spatial details, and it is of great significance for the high-quality interpretation of satellite imagery. Recently, Transformer-based models have shown competitive performance in single image super-resolution (SISR). However, current transformer-based SR approaches often employ window self-attention with fixed small window sizes, limiting the receptive filed to a single scale and preventing the network from gathering multi-scale information such as local textures and repetitive patterns, impeding the model’s ability to remote sensing images. Moreover, the quadratic computational complexity resulting from global self-attention, rendering it inefficient for addressing RSISR tasks that involve processing high-resolution images. To address these issues, we proposed a vision transformer architecture called residual hierarchical transformer network (RHTN). Specifically, we have developed a residual hierarchical transformer block (RHTB) as a building block in RHTN. In the RHTB, we introduce a novel spatial-channel self-attention mechanism characterized by linear complexity relative to window dimensions. This design optimally harvests both spatial structural information and channel-wise features from the hierarchical window framework while maintaining computational tractability. Then, we adopt the spatial-gate feed-forward network to further model additional non-linear spatial information. We conducted comprehensive experiments on multiple benchmark datasets, demonstrating the superior performance of our proposed RHTN in terms of quantitative metrics and visual quality when compared to state-of-the-art methods.

Article activity feed