Research on Cooperative Vehicle–Infrastructure Perception Integrating Enhanced Point-Cloud Features and Spatial Attention

Shiyang Yan
Yanfeng Wu
Zhennan Liu
Chengwei Xie

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Vehicle–infrastructure cooperative perception (VICP) extends the sensing capability of single-vehicle systems by integrating multi-source information from onboard and roadside sensors, thereby alleviating limitations in sensing range and field-of-view coverage. However, in complex urban environments, the robustness of such systems—particularly in terms of blind-spot coverage and feature representation—is severely affected by both static and dynamic occlusions, as well as distance-induced sparsity in point cloud data. To address these challenges, a 3D object detection framework incorporating point cloud feature enhancement and spatially adaptive fusion is proposed. First, to mitigate feature degradation under sparse and occluded conditions, a Redefined Squeeze-and-Excitation Network (R-SENet) attention module is integrated into the feature encoding stage. This module employs a dual-dimensional squeeze-and-excitation mechanism operating across pillars and intra-pillar points, enabling adaptive recalibration of critical geometric features. In addition, a Feature Pyramid Backbone Network (FPB-Net) is designed to improve target representation across varying distances through multi-scale feature extraction and cross-layer aggregation. Second, to address feature heterogeneity and spatial misalignment between heterogeneous sensing agents, a Spatial Adaptive Feature Fusion (SAFF) module is introduced. By explicitly encoding the origin of features and leveraging spatial attention mechanisms, the SAFF module enables dynamic weighting and complementary fusion between fine-grained vehicle-side features and globally informative roadside semantics. Extensive experiments conducted on the DAIR-V2X benchmark and a custom dataset demonstrate that the proposed approach outperforms several state-of-the-art methods. Specifically, Average Precision (AP) scores of 0.762 and 0.694 are achieved at an IoU threshold of 0.5, while AP scores of 0.617 and 0.563 are obtained at an IoU threshold of 0.7 on the two datasets, respectively. Furthermore, the proposed framework maintains real-time inference performance, highlighting its effectiveness and practical potential for real-world deployment.

Version published to 10.3390/wevj17040164
Mar 24, 2026
Version published to 10.20944/preprints202602.0830.v1
Feb 10, 2026

A feature enhancement and attention fusion network for small object detection in UAV imagery

This article has 4 authors:
1. Xilong Xu
2. Peng Li
3. Hongwei Ding
4. Jinhua Yang
This article has no evaluationsLatest version Mar 23, 2026
A Vision-Based Multi-Camera Fusion and CAD Projection Framework for Risk-Point Identification in Metro Stations

This article has 5 authors:
1. Yichao Pu
2. Qianqi Fan
3. Shengyu Zhang
4. Qi Zhang
5. Jianyong Zuo
This article has no evaluationsLatest version Feb 16, 2026
P2R-OBB: A Unified Framework for Multi-Scale and Orientation-Aware Ship Detection

This article has 6 authors:
1. Keyi Hu
2. Wenbo Zhang
3. Tao Wang
4. Hao Zhang
5. Weidong Wang
6. Haixia Long
This article has no evaluationsLatest version Feb 3, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A feature enhancement and attention fusion network for small object detection in UAV imagery

A Vision-Based Multi-Camera Fusion and CAD Projection Framework for Risk-Point Identification in Metro Stations

P2R-OBB: A Unified Framework for Multi-Scale and Orientation-Aware Ship Detection