YOLOX-SEH: Efficient Small Target Detection in Remote Sensing Images
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Object detection is a crucial and challenging problem in computer vision. In remote sensing image object detection, objects can appear in any orientation, complicating accurate detection. Additionally, the presence of small objects and environmental factors pose significant challenges to existing deep learning-based object detection algorithms. To address these issues, this paper proposes a YOLOX-based object detection algorithm named YOLOX-SEH. To enhance the detection accuracy of objects appearing in any orientation, a Swin Transformer structure is incorporated into the YOLO framework along with other optimizations. Specifically, the Swin Transformer employs sliding windows and hierarchical structures for more efficient and flexible computation, capturing global information from multiple scales with improved feature extraction capability. Additionally, an Explicit Visual Center (EVC) structure is introduced to capture global long-range dependencies through a lightweight MLP network, enhancing the accuracy and efficiency of object detection algorithms. To improve the detection of small objects, an additional prediction head is added to detect tiny objects, complementing YOLOX’s original three prediction heads for effective multi-scale object detection. Extensive experiments on the DOTA v1.0 and DIOR datasets demonstrate that our YOLOX-SEH performs well in remote sensing image detection with a small model size of only 8MB.