MFTFF : multi-frame 3D object detection with temporal feature fusion

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In autonomous driving, 3D object detection is crucial. LiDAR continuously generate real-time point cloud, which serves as the foundation for 3D object detection. Traditional single-frame 3D detection methods typically rely on point cloud from individual time intervals. However, these methods often fail to fully utilize the temporal information in consecutive point cloud sequences, limiting their detection performance. In this work, the study introduces a novel architecture called the Multi-frame Temporal Feature Fusion Network (MFTFF). MFTFF enhances the features at the current time by effectively integrating the features from the point cloud sequence. The network includes the Multi-frame Feature Conversion and Alignment Module (MFCA). This module aligns different point cloud frames in the time series to a unified perspective. In addition, we introduce an efficient Multi-frame Feature Fusion Module(MFFF). This module performs a coarse-grained fusion through concatenation and a fine-grained fusion by using a temporal attention mechanism. This fusion strategy preserves the essential temporal information and improves the 3D object detection performance effectively. After the fusion, the final features are stored in the memory bank for continuous updates and iterations. This mechanism provides rich feature information for subsequent detection tasks. On the nuScenes dataset, MFTFF achieved 68.6% in NDS and 62.2% in mAP. Compared to the baseline method, MFTFF improved NDS by 1.3% and mAP by 1.9%.

Article activity feed