DTULC Dataset and CDGANet: Advancing Urban Land Cover Segmentation with High-Resolution Satellite Imagery
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Semantic segmentation of high-resolution remote sensing imagery remains challenging due to the coexistence of multiscale objects, complex spatial contexts, and ambiguous boundaries. While existing convolutional and transformer-based methods have made strides in natural scene understanding, they often underperform on remote sensing data due to insufficient detail retention and inefficient multi-scale feature modeling. To address these limitations, we propose CDGANet, a novel architecture integrating a Cross-Layer Detail-Aware Module (CDM) and Group Collaborative Attention Mechanism (GCAM). Leveraging ConvNeXt as the backbone, CDGANet employs CDM to fuse high-level semantics with low-level textures via self-attention, preserving boundary precision, while GCAM processes multi-scale features in parallel groups to enhance small-object discrimination. We introduce the DTULC dataset, a high-resolution urban land cover benchmark derived from Gaofen-2 satellite imagery, capturing diverse landscapes in Datong City, Shanxi Province. Experiments demonstrate CDGANet’s superiority over UNet, PSPNet, and SwinTransformer, achieving state-of-the-art performance with 74.23% mPA, 58.91% mIoU, and 72.24% F1-score on DTULC. Ablation studies confirm that GCAM and CDM jointly improve mIoU by 7.07% over baseline models. This work advances fine-grained land cover analysis and offers practical value for ecological monitoring and sustainable urban planning.