A Deep Learning-Based Visual SLAM Approach for Indoor Dynamic Scenes

jiarui qin
yugang wang
xueli cong
liyao zhou

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In indoor dynamic scenes, traditional visual SLAM algorithms often suffer from a significant decline in localization accuracy due to interference from dynamic objects. To address this issue, this study proposes a visual SLAM method based on deep learning. Built upon the ORB-SLAM3 framework, the proposed method effectively enhances the system's capability in complex dynamic environments by constructing a dual dynamic feature suppression mechanism that integrates object detection and epipolar geometric constraints. Specifically, the YOLOv8m network is first lightweighted by introducing GhostConv modules to replace the original convolutional layers, reducing computational overhead. Additionally, the CBAM attention mechanism is incorporated into the backbone network to improve the accuracy of dynamic object detection. To further enhance feature stability in complex scenes, the algorithm introduces LSD line features for joint extraction alongside point features. Leveraging the robustness of line features in low-texture or dynamic edge regions, it initially filters out obvious dynamic features. Subsequently, epipolar geometric constraints are applied for a secondary screening of residual dynamic points, achieving high-precision retention of static features. Finally, pose-assisted localization is accomplished using the P3P algorithm based on the filtered static features. Experimental results on the TUM RGB-D dataset demonstrate that the improved algorithm improves localization accuracy by an average of 53.10% compared to ORB-SLAM3, with a maximum improvement of up to 93.56%. On the EuRoC-MAV dataset, the improved algorithm reduces the average ATE-RMSE by 85.32%, with improvements exceeding 85.5% on highly challenging sequences, and maintains an ATE-StD below 0.035m under complex operating conditions, enhancing localization accuracy and anti-interference capability. Physical validation further confirms the superior accuracy and robustness of the proposed method, offering an effective approach for research on high-precision SLAM systems in dynamic environments.

Version published to 10.21203/rs.3.rs-8981127/v1 on Research Square
Mar 30, 2026

BPC-SLAM: Part-Level Dynamic Suppression and Structure-Constrained RGB-D SLAM for Human-Centric Dynamic Environments

This article has 5 authors:
1. Wang Yang
2. Jiupeng Chen
3. Hongjun San
4. Fan Zhang
5. Wunyu Xu
This article has no evaluationsLatest version Apr 2, 2026
CFA-DeepLabV3+: Cross-level Fusion and Attention Network for Lightweight Road Segmentation

This article has 6 authors:
1. Xin Zhang
2. Yan Li
3. Zexi Hua
4. XiangZhen Zhou
5. YuGe Pan
6. Hui Qiao
This article has no evaluationsLatest version Apr 8, 2026
A feature enhancement and attention fusion network for small object detection in UAV imagery

This article has 4 authors:
1. Xilong Xu
2. Peng Li
3. Hongwei Ding
4. Jinhua Yang
This article has no evaluationsLatest version Mar 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

BPC-SLAM: Part-Level Dynamic Suppression and Structure-Constrained RGB-D SLAM for Human-Centric Dynamic Environments

CFA-DeepLabV3+: Cross-level Fusion and Attention Network for Lightweight Road Segmentation

A feature enhancement and attention fusion network for small object detection in UAV imagery