FF3F: Feature-Fused 3 Frame Hybrid NeuralNetwork Framework for 3D Tracking ofFast-Moving Small Objects
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Detecting and tracking fast-moving small objects with LiDAR remains a challenge. Conventional computer vision approaches rely heavily on contour segmentation, which degrades sharply when the accessible features of the targets are limited. To address this limitation, we propose the Feature-Fused ThreeFrame (FF3F) detection algorithm, which integrates lightweight low-level neural networks and conditional convolution layers to achieve a balance between accuracy and efficiency. FF3F fuses pixel drifts, centroid displacements, and velocity estimates within a confidence-aware framework, enabling robust motion estimation even under scarce signal or noisy environments. The “three-frame” design refers to integrating temporal features across three consecutive frames, thereby strengthening recognition consistency. The approach is tested and compared against an end-to-end three-frame neural network-based detection (EE3F) model and a traditional feature-based optical flow (FBOF) algorithm. This study uses solid-state LiDAR sensors. Performance was evaluated by Precision, Recall, F1(the harmonic mean of its precision and recall rate), and IoU (Intersection over Union). When target objects are moving at 9–12 m/s and covering approximately 128 pixels, the FF3F model achieved an average recall of 0.89, outperforming other frameworks (EE3F: 0.78, FBOF: 0.59). FF3F had a latency between 27.31ms and 92.35 ms. In comparison, EE3F exhibited higher latency (59.42 ms –171.64 ms), while FBOF was faster (17.13 ms – 55.43 ms) but substantially less accurate. These results confirm that FF3F effectively balances accuracy and computational efficiency under visibility constraints.