A Deep Learning-Based Visual SLAM Approach for Indoor Dynamic Scenes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In indoor dynamic scenes, traditional visual SLAM algorithms often suffer from a significant decline in localization accuracy due to interference from dynamic objects. To address this issue, this study proposes a visual SLAM method based on deep learning. Built upon the ORB-SLAM3 framework, the proposed method effectively enhances the system's capability in complex dynamic environments by constructing a dual dynamic feature suppression mechanism that integrates object detection and epipolar geometric constraints. Specifically, the YOLOv8m network is first lightweighted by introducing GhostConv modules to replace the original convolutional layers, reducing computational overhead. Additionally, the CBAM attention mechanism is incorporated into the backbone network to improve the accuracy of dynamic object detection. To further enhance feature stability in complex scenes, the algorithm introduces LSD line features for joint extraction alongside point features. Leveraging the robustness of line features in low-texture or dynamic edge regions, it initially filters out obvious dynamic features. Subsequently, epipolar geometric constraints are applied for a secondary screening of residual dynamic points, achieving high-precision retention of static features. Finally, pose-assisted localization is accomplished using the P3P algorithm based on the filtered static features. Experimental results on the TUM RGB-D dataset demonstrate that the improved algorithm improves localization accuracy by an average of 53.10% compared to ORB-SLAM3, with a maximum improvement of up to 93.56%. On the EuRoC-MAV dataset, the improved algorithm reduces the average ATE-RMSE by 85.32%, with improvements exceeding 85.5% on highly challenging sequences, and maintains an ATE-StD below 0.035m under complex operating conditions, enhancing localization accuracy and anti-interference capability. Physical validation further confirms the superior accuracy and robustness of the proposed method, offering an effective approach for research on high-precision SLAM systems in dynamic environments.