A Dynamic Attention Mechanism for Road Extraction from High-Resolution Remote Sensing Imagery Using Feature Fusion
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate road information is critical for intelligent navigation and urban planning. Compared with traditional road detection methods, deep learning-based approaches have demonstrated significant advantages in road extraction from remote sensing imagery. However, challenges such as occlusion by vegetation and buildings, as well as the similarity between roads and surrounding objects, often lead to incomplete road extraction. To address these issues, we propose a novel deep learning model, RISENet, which consists of three main components: a dual-branch fusion encoder, a multi-layer dynamic spatial channel fusion attention mechanism (MCSA), and a hybrid feature dilation-aware decoder. The dual-branch encoder leverages dual convolutions and multi-head deep convolutions to extract fundamental features and capture fine-grained details. The feature fusion module integrates both global and local information, enhancing the model’s ability to represent features effectively. The MCSA captures long-range dependencies within remote sensing images, improving the differentiation between roads and other objects. The dilation-aware decoder dynamically expands the receptive field, preserving global features while reducing the loss of fine details. The proposed RISENet was comprehensively evaluated on three distinct road segmentation benchmarks, demonstrating superior accuracies of 90.04%, 92.24%, and 88.18% respectively. In terms of visual quality and quantitative indicators, the method proposed in this study demonstrates excellent performance. The ablation experiments have also confirmed the effectiveness of the adopted loss function and fusion strategy. These fully indicate that RISENet performs remarkably well in road segmentation tasks across various datasets and exhibits considerable robustness.