Enhancing Pure-Vision BEV 3D Perception with Hybrid Data-Feature Optimization
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Bird’s eye view (BEV)-based approaches relying solely on cameras offer a cost-effective alternative to LiDAR-based solutions for 3D perception, attracting significant research attention. However, existing methods face limitations in data diversity, contextual modeling across multi-camera views, and multi-scale feature fusion. To address these challenges, this paper proposes HDO-BEV, a Hybrid Data-Feature Optimization-enhanced architecture for pure-vision BEV 3D perception. HDO-BEV integrates three novel modules: RandomFlip for data augmentation, HS-FPN for optimized multi-scale feature fusion, and ContextBlock for context enhancement. Experiments on the nuScenes-mini dataset demonstrate that HDO-BEV achieves 0.388 mAP and 0.472 NDS, outperforming the baseline SA-BEV by 0.9\%. These results validate that targeted architectural enhancements can significantly advance pure-vision BEV 3D environmental sensing for scalable autonomous driving systems.The source code for this study is open-sourced at: https://github.com/zaixianbaipiao/Enhancing-Pure-Vision-BEV-3D-Perception-with-Hybrid-Data-Feature-Optimization [DOI: 10.5281/zenodo.18587515]