Multi-Scale Feature Fusion for Cross-Modality Person Re-Identification: The MSJLNet Approach
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Visible-Infrared person re-identification (VI-ReID) faces significant challenges due to discrepancies between visible and infrared images. Traditional two-stream networks often struggle to preserve semantic guidance from data augmentation as network depth increases. To address this, we propose the Multi-Scale Joint Learning Network (MSJLNet), which employs a novel four-stream architecture to segregate data-augmented branches from original branches, focusing on extracting robust and color-agnostic modal features. An Information Purification Module (IPM) with a channel attention mechanism is designed to dynamically filter noise and suppress redundant color information in the augmented branches. Furthermore, a Joint Semantic Learning Module (JSLM) effectively fuses global detail features with color-agnostic features, improving the model’s discriminative ability. Extensive experiments on the SYSU-MM01 and RegDB datasets demonstrate MSJLNet’s superior performance, achieving 79.94\(%\) Rank-1 accuracy and 74.96$%$ mAP on SYSU-MM01, and 93.14$%$ Rank-1 accuracy and 87.22$%$ mAP on RegDB. The proposed approach offers new insights for enhancing cross-modality feature learning. Code is available at https://github.com/1849714926/MSJLNet.