Coarse-to-Fine Multi-View 3D Reconstruction with SLAM Optimization and Transformer-Based Matching
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The complexity of reconstructing 3D scenes from multi-view datasets continues to challenge the field of computer vision due to variations in viewpoint and overlapping regions among images. This study proposes a coarse-to-fine structured light framework that integrates sparse and dense feature matching techniques to enhance both the efficiency and accuracy of multi-view 3D reconstruction. By incorporating a Simultaneous Localization and Mapping (SLAM)-based approach and parallel bundle adjustment, our model demonstrates superior performance on key metrics—feature matching accuracy, reprojection error, and camera trajectory precision—compared to existing frameworks. Notably, our approach introduces a Transformer-based multi-view matching module to bolster robustness and optimize reconstruction accuracy with a hybrid loss function. Experimental results on public multi-view datasets confirm substantial improvements across standard evaluation metrics, indicating the framework's efficacy in addressing multi-view inconsistency.