CSM-DETR: Construction Site Monitoring via Mamba-Enhanced Detection Transformer for UAV Aerial Imagery

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Unmanned Aerial Vehicles (UAVs) offer significant advantages for construction site monitoring through flexible deployment and high-resolution imagery. However, existing vision-based detection methods face significant challenges including extreme scale variations, dense object distributions, complex backgrounds, and real-time processing constraints. To address these limitations, we propose CSM-DETR, a novel detection transformer specifically designed for UAV-based construction monitoring. Our framework adopts the MobileMamba as backbone to achieve linear computational complexity $\mathcal{O}(n)$ while capturing long-range spatial dependencies, and incorporates the Hierarchical Local-Aware Fusion (HLAF) mechanism for adaptive multi-scale feature aggregation. Furthermore, we propose three key innovations: (1) a Dual-Attention Spatial Integration (DASI) module enhancing multi-scale spatial feature representation through parallel local and global attention streams; (2) a Cross-Scale Deformable Fusion (CSDF) module enabling flexible cross-scale feature interaction through deformable sampling; and (3) a Scale-Aware Composite Loss (SAC Loss) providing scale-aware supervision for challenging small objects. We construct a comprehensive benchmark dataset named UAV-CSM47, containing 15,860 high-resolution aerial images with 47 construction-related object categories. Extensive experiments demonstrate that CSM-DETR achieves state-of-the-art performance with 91.8\% mAP@0.5 and 73.6\% mAP@0.5:0.95, outperforming YOLOv13-L by 3.3 percentage points and Co-DETR by 2.7 percentage points while maintaining real-time inference at 38 FPS. Ablation studies validate each component's effectiveness, and cross-domain evaluation confirms strong generalization capability. The proposed system provides a practical solution for automated construction site monitoring with broad applications in safety supervision, progress tracking, and resource management.

Article activity feed