Decoupling Geometry and Content: A Reliability-Aware Attention Framework for Robust BEV Perception

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multi-camera Bird’s-Eye-View (BEV) perception is fundamental to autonomous driving. However, existing query-based methods suffer from a representational coupling bottleneck: a single BEV query is compelled to simultaneously encode spatial invariance for geometric localization and semantic sensitivity for con- tent assessment. Shallow linear projections in standard attention mechanisms struggle to disentangle these conflicting attributes, leading to sampling instabil- ity. Furthermore, the conventional unconditional feature fusion paradigm lacks explicit quality assessment, allowing low-quality observations to contaminate temporal features. To address these issues, we propose a Self-Regulating Atten- tion (SRA) framework. A Structurally Decoupled Geometry-Content Attention (SDGCA) replaces the shallow linear projections of standard attention with ded- icated sub-networks, enabling it to actively decouple and extract task-specific representations from the BEV query. We posit that a highly reliable feature update should not only stem from a geometrically sound sampling process, but also from a high-confidence content aggregation. Guided by this insight, we in- troduce the Cooperative Assessment Gating (CAG), which leverages the affine transformation matrix and information entropy from SDGCA to rigorously eval- uate the reliability of the proposal along these two orthogonal dimensions. An adaptive gating module then arbitrates by fusing these evaluation signals with the query context to dynamically regulate the feature update intensity, sig- nificantly improving update reliability. Experiments on the nuScenes dataset demonstrate that SRA-Former achieves a state-of-the-art mAP of 0.425 and significantly reduces orientation estimation errors compared to strong baselines. Crucially, these improvements are obtained without relying on explicit depth or height supervision. This research validates that structural decoupling and reliability-aware gating alone can establish a high-performance BEV perception paradigm, offering a robust alternative to prior-dependent methods.

Article activity feed