LiDAR-Camera Range-View fusion for 3D Object Detection in Autonomous Driving
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In recent years, LiDAR-Camera fusion for 3D object detection has emerged as a research hotspot, owing to its superior performance compared to single-sensor approaches. However, the vast majority of methods adopt fusion in the point, voxel, or BEV domains. The distinct representations of LiDAR-Camera data lead to domain discrepancies and misalignment of their features, resulting in suboptimal detection accuracy and slow inference speeds. In contrast, this paper introduces a novel range-view fusion strategy, leveraging its unique advantages --- data homogeneity, accurate feature alignment, and high efficiency --- over other methods. We propose the Range-view Pyramidal Fusion network (RPfusion), in which two modules are specifically designed to address the inherent problems of range-view fusion: the significant scale variations of objects and the feature vanishment of occluded objects. First, Appearance Feature Fusion (AFF) module adaptively integrates complementary geometric and semantic information across different scales using dilated cross-attention. Next, a Range-Guided Cross-Layer Modulation (RGCLM) module is proposed to extract rich contextual information for comprehensive perception and enhance features of occluded targets. The output features can globally decorate raw point cloud, serving as input to any LiDAR-based detectors. Additionally, we propose RPfusion-Det, which employs a RoIfusion module with grid-wise self-attention for local fusion, leading to more precise box refinement. Extensive experiments on the KITTI and NuScenes datasets demonstrate that RPfusion can consistently enhance various detection models. Meanwhile, RPfusion-Det achieves superior accuracy while maintaining efficient inference.