Enhancing Pure-Vision BEV 3D Perception with Hybrid Data-Feature Optimization

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Bird’s eye view (BEV)-based approaches relying solely on cameras offer a cost-effective alternative to LiDAR-based solutions for 3D perception, attracting significant research attention. However, existing methods face limitations in data diversity, contextual modeling across multi-camera views, and multi-scale feature fusion. To address these challenges, this paper proposes HDO-BEV, a Hybrid Data-Feature Optimization-enhanced architecture for pure-vision BEV 3D perception. HDO-BEV integrates three novel modules: RandomFlip for data augmentation, HS-FPN for optimized multi-scale feature fusion, and ContextBlock for context enhancement. Experiments on the nuScenes-mini dataset demonstrate that HDO-BEV achieves 0.388 mAP and 0.472 NDS, outperforming the baseline SA-BEV by 0.9\%. These results validate that targeted architectural enhancements can significantly advance pure-vision BEV 3D environmental sensing for scalable autonomous driving systems.The source code for this study is open-sourced at: https://github.com/zaixianbaipiao/Enhancing-Pure-Vision-BEV-3D-Perception-with-Hybrid-Data-Feature-Optimization [DOI: 10.5281/zenodo.18587515]

Article activity feed