Multiscale Context-Aware Network for Remote Sensing Images Semantic Segmentation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Remote sensing images typically exhibit significant scale and appearance variations, requiring deep learning-based semantic segmentation methods to effectively capture local and global contextual information to improve segmentation accuracy and handle complex scenes. Although CNN-based methods have been widely applied, they excel at capturing local details but are limited in modeling global context. By relying on multi-head self-attention, transformers can effectively capture global dependencies but often incur high computational costs. In this paper, we propose a Multiscale Context-Aware Network (MCANet) that combines CNN and Transformer architectures to comprehensively and efficiently model multi-scale local and global contextual relationships. Specifically, an Adaptive Feature Enhancement Module (AFEM) is designed to enhance local contextual representations of multiscale features in the encoder using large-kernel strip convolutions and frequency-adaptive weighting. Meanwhile, we develop a Multi-scale Global-context Transformer Block (MGTB) in the decoder to efficiently extract global contextual information across different scales. Furthermore, the Feature Fusion Module (FFM) is introduced to integrate the local context enhanced by AFEM and the global context generated by MGTB, thus further promoting the joint learning of local and global information. Extensive quantitative and qualitative experiments conducted on the public Vaihingen and Potsdam datasets demonstrate that the proposed MCANet achieves superior segmentation performance compared to existing mainstream methods.