GCL-BEV: Enhancing Pure Vision 3D Detection with Motion Priors and View-Consistency Learning

Zhipeng Qi
Zhijun Xie
Jing Xu
Rui Wang
Ming Jin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Temporal fusion has become a de facto standard in vision-centric Bird’s-Eye-View (BEV) perception, enabling velocity estimation and occlusion mitigation. However, existing paradigms typically rely on rigid geometric alignment (e.g., ego-pose warping) to aggregate historical features. We identify that this assumption is fragile: under aggressive ego-motion, such as rapid turning, the non-linear distortion of visual features leads to significant spatial misalignment, causing prediction jitter and feature smearing. To bridge this gap, we propose GCL-BEV, a robust detection framework that enforces geometric consistency through both architectural evolution and optimization constraints. First, we introduce a Geometric-Aware Feature Enhancement (GAFE) module. Unlike standard deformable convolutions that infer offsets from visual appearance, GAFE explicitly utilizes kinematic priors (ego-motion) to guide the dynamic deformation of the receptive field, ensuring feature alignment before temporal fusion. Second, we propose a View-Consistency Learning (VCL) objective. Formulated as a Siamese equivariance constraint, VCL compels the backbone to learn rotation-invariant representations during training, enhancing robustness against viewpoint perturbations with strictly zero inference overhead. Extensive experiments on the nuScenes dataset demonstrate that GCL-BEV achieves state-of-the-art performance among ResNet-101 based methods (57.8% NDS, 46.2% mAP). Crucially, our method significantly reduces Orientation Error (mAOE) by 5.4% compared to the baseline, validating its superiority in maintaining geometric stability under complex driving maneuvers.

Version published to 10.21203/rs.3.rs-9196712/v1 on Research Square
Mar 31, 2026

BPC-SLAM: Part-Level Dynamic Suppression and Structure-Constrained RGB-D SLAM for Human-Centric Dynamic Environments

This article has 5 authors:
1. Wang Yang
2. Jiupeng Chen
3. Hongjun San
4. Fan Zhang
5. Wunyu Xu
This article has no evaluationsLatest version Apr 2, 2026
A Deep Learning-Based Visual SLAM Approach for Indoor Dynamic Scenes

This article has 4 authors:
1. jiarui qin
2. yugang wang
3. xueli cong
4. liyao zhou
This article has no evaluationsLatest version Mar 30, 2026
Viewpoint-Aware Pose Estimation Framework for Cooperative UAVs

This article has 4 authors:
1. Youngrun Kim
2. Heokjune You
3. Seunghyun Choi
4. Dongwon Jung
This article has no evaluationsLatest version Apr 1, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

BPC-SLAM: Part-Level Dynamic Suppression and Structure-Constrained RGB-D SLAM for Human-Centric Dynamic Environments

A Deep Learning-Based Visual SLAM Approach for Indoor Dynamic Scenes

Viewpoint-Aware Pose Estimation Framework for Cooperative UAVs