Remote Sensing Multi-View Stereo using ConvLSTM Guided Iterative Depth Refinement
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multi-view 3D reconstruction from remote sensing imagery has emerged as a critical re-search direction in both computer vision and remote sensing. While deep learning-based methods have demonstrated remarkable success in close-range reconstruction, remote sensing scenarios continue to pose significant challenges. These include edge blurring due to varying viewpoints, noise artifacts in shadowed regions, and discontinuous depth estimates in areas with smoothly varying elevation, all of which hinder accurate matching and degrade reconstruction quality. To address these issues, we propose a novel end-to-end network, termed CGIDR-Net, which enhances remote sensing multi-view stereo (MVS) reconstruction through ConvLSTM guided iterative depth refinement. Specifically, we design a Deformable Channel Transformation Module (DCTM) to alleviate edge blurring across views by adaptively capturing spatial and channel-wise variations. Furthermore, we introduce a Differentiable 3D Masked Warping (MW) mechanism that leverages learnable masks to construct a more reliable cost volume, effectively sup-pressing occlusions and geometric distortions. Finally, we incorporate an Iterative Depth Refinement Module (IDRM) based on ConvLSTM, which progressively integrates spatial and contextual cues to refine depth predictions. Extensive experiments on several public datasets, including WHU, LuoJia-MVS, WHU-OMVS, and DTU, demonstrate that CGIDR-Net achieves superior performance in both accuracy and robustness compared to existing state-of-the-art methods.