A Robust Framework Fusing Visual SLAM and 3D Gaussian Splatting with a Coarse-Fine Method for Dynamic Regions Segmentation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Existing systems integrating neural representations with visual SLAM excel in static scenes but falter in dynamic environments where moving objects degrade localization and mapping performance. To address this, we propose a robust dynamic SLAM framework that leverages explicit geometric features for localization while learning implicit photometric feature representations to capture the texture of the observed environment. Our method first employs an instance segmentation network and a Kalman filter for multi-object tracking. We then introduce a cascaded, coarse-to-fine strategy for efficient motion analysis. A lightweight, sparse optical flow method along object contours performs an initial coarse screening to identify clearly static or globally moving objects. For ambiguous targets requiring detailed analysis, a fine-grained motion segmentation is then conducted using dense optical flow clustering. By excluding features on identified dynamic regions, our system significantly improves camera pose estimation accuracy, reducing absolute trajectory error by up to 95% on dynamic TUM RGB-D sequences compared to ORB-SLAM3, and generates cleaner dense maps. The mapping backend utilizes a 3D Gaussian Splatting renderer, optimized with a Gaussian pyramid-based training strategy. Validations on diverse datasets demonstrate our system’s superior robustness, achieving accurate localization and high-quality mapping in dynamic scenarios, while the cascaded strategy reduces motion analysis computation time by 91.7% compared to a dense-only approach.