Deeply interactive pillars: Achieve advanced feature enhancement in 3D point clouds with deep interactions

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The pillar-based feature learning mode demonstrates significant efficiency in 3D object detection. However, its aggressive downsampling operation results in the loss of explicit geometric cues, which adversely affects the model's ability to understand small target objects. Additionally, during the Pillar Feature Encoding (PFE) process, the 3D object detector encounters issues such as high-dimensional information loss and uneven numerical distribution, which hinder the improvement of model performance and quantification potential. To address these challenges, we introduce a novel method called Deeply Interactive Pillars (DIP). First, we propose a self-supplying dual token mechanism to facilitate global feature interaction. By incorporating spatial tokens and semantic tokens, the model enhances efficient interaction across global features, and the synergy between these tokens improves the model's ability to represent small targets in complex scenarios. Second, we design a new feature fusion module, the 3D Dual-Pool Attention Fusion Module (DP-AF), to refine the max-pooling operation in PointPillars. The DP-AF module integrates the benefits of double pooling and the Squeeze-and-Excitation (SE) mechanism, effectively enhancing the important information within the feature map of 3D point cloud data.Extensive experiments on the KITTI dataset validate the superior performance of our proposed DI-Pillar method. In pedestrian and cyclist detection tasks, DI-Pillar achieves accuracy rates of 59.85%, 53.30%, and 48.12% for pedestrians, and 89.07%, 67.70%, and 63.45% for cyclists, across the three difficulty levels of easy, medium, and hard, respectively. These results demonstrate the effectiveness and robustness of the proposed method.

Article activity feed