A CLIP-Based Uncertainty Modal Modeling (UMM) Framework for Pedestrian Re-Identification in Autonomous Driving

Jialin Li
Shuqi Wu
Ning Wang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Re-Identification (ReID) is a critical technology in intelligent perception systems, especially within autonomous driving, where onboard cameras must identify pedestrians across views and time in real-time to support safe navigation and trajectory prediction. However, the presence of uncertain or missing input modalities—such as RGB, infrared, sketches, or textual descriptions—poses significant challenges to conventional ReID approaches. While large-scale pre-trained models offer strong multimodal semantic modeling capabilities, their computational overhead limits practical deployment in resource-constrained environments. To address these challenges, we propose a lightweight Uncertainty Modal Modeling (UMM) framework, which integrates a multimodal token mapper, synthetic modality augmentation strategy, and cross-modal cue interactive learner. Together, these components enable unified feature representation, mitigate the impact of missing modalities, and extract complementary information across different data types. Additionally, UMM leverages CLIP's vision-language alignment ability to fuse multimodal inputs efficiently without extensive fine-tuning. Experimental results demonstrate that UMM achieves strong robustness, generalization, and computational efficiency under uncertain modality conditions, offering a scalable and practical solution for pedestrian re-identification in autonomous driving scenarios.

Version published to 10.20944/preprints202506.0170.v1
Jun 3, 2025

Vehicle Re-Identification via Multi-Scale Feature Learningand Dual Attention Fusion.

This article has 2 authors:
1. Haifeng Sang
2. Bochi Zhu
This article has no evaluationsLatest version Apr 21, 2025
Deeply Supervised Self-Attention Learning Model for Person Re-Identification

This article has 2 authors:
1. Miftah Bedru Jamal
2. Yaping Zhao
This article has no evaluationsLatest version Jun 2, 2025
Zero-shot Object Visual Navigation Using Relation of Historical Objects with Target Transfer

This article has 5 authors:
1. Jiangpeng Zheng
2. Fan Shi
3. Chen Jia
4. Meng Zhao
5. Shengyong Chen
This article has no evaluationsLatest version Apr 28, 2025

Listed in

Abstract

Article activity feed

Related articles

Vehicle Re-Identification via Multi-Scale Feature Learningand Dual Attention Fusion.

Deeply Supervised Self-Attention Learning Model for Person Re-Identification

Zero-shot Object Visual Navigation Using Relation of Historical Objects with Target Transfer