Object Localization in Images via Fusion of SFM and YOLO

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Along with the rapid development of computer vision, unmanned aerial vehicle (UAV) technology, and autonomous driving, the precise acquisition of coordinate information for captured objects in geospatial space has become a core requirement for numerous applications. Traditional object detection methods, such as the YOLO series algorithms, primarily focus on object categories and two-dimensional bounding boxes, lacking the capability for precise estimation of object spatial positions. In this paper, a novel framework fusing the Structure from Motion (SfM) algorithm with an improved YOLOv8 object detection model is proposed to achieve high-precision geospatial localization of objects of interest in image sequences. Initially, the SfM algorithm is utilized to perform three-dimensional reconstruction on multi-view images, generating sparse point clouds with spatial coordinates. Simultaneously, a YOLOv8 model incorporating a hybrid attention mechanism (GAM and CA) is employed for precise detection of target objects. Finally, by reprojecting the 3D point cloud to the image plane and performing spatial constraint matching with two-dimensional detection boxes, a mapping relationship between two-dimensional detection results and three-dimensional spatial coordinates is established. Experimental results demonstrate that ablation studies on the VisDrone dataset confirm the effectiveness of the hybrid attention mechanism. Localization experiments on vehicle image sequences collected by UAVs verify that the proposed method can accurately acquire the geographical coordinates of captured objects. This study not only expands the functional boundaries of traditional object detection but also provides technical support for application scenarios such as UAV visual localization and intelligent traffic monitoring.

Article activity feed