ADT-Net: Adaptive Transformation-Driven Text-Based Person Search Network for Enhancing Cross-Modal Retrieval Robustness

zimo li
yanxiao gong
guoqing zhang
jianwei zhang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Text-based person search aims to retrieve person images matching a given textual description. The challenge lies in mapping images and textual descriptions into a unified semantic space. This paper introduces ADT-Net, a novel framework designed to address the issue of excessive intra-class variance and insufficient inter-class variance caused by lighting variations. ADT-Net comprises two key modules: Invariant Representation Learning (IRL), which employs style transfer strategies and multi-scale alignment techniques to learn visually invariant features, and Dynamic Matching Alignment (DMA), which introduces nonlinear transformations and learnable dynamic temperature parameters to optimize the prediction distribution. Experimental results on multiple benchmark datasets demonstrate that ADT-Net outperforms current mainstream baseline methods, achieving superior retrieval accuracy and generalization ability. Here, we show that our proposed method significantly enhances the robustness of cross-modal person retrieval, particularly under varying lighting conditions and shooting angles. Code is available at https://github.com/2Elian/ADT-Net.

Version published to 10.21203/rs.3.rs-7010785/v1 on Research Square
Jul 18, 2025

Multi-Scale Feature Fusion for Cross-Modality Person Re-Identification: The MSJLNet Approach

This article has 6 authors:
1. Zhixin Tie
2. Haobiao Fan
3. Lingbing Tao
4. Yanbing Chen
5. Hao Sheng
6. Wei Ke
This article has no evaluationsLatest version Sep 2, 2025
Enhancing Infrared-Visible Image Fusion via Text-Guided Adaptive Feature Integration

This article has 6 authors:
1. Jundong Zhang
2. Yanan Guo
3. Kangjian He
4. Dan Xu
5. SongHan Zheng
6. WenCheng Mei
This article has no evaluationsLatest version Jul 23, 2025
Two-Stage Fine-tuning CLIP by Introducing Structure Knowledge for Few-shot Classification

This article has 4 authors:
1. Zhe Zhang
2. Xiang-Gui Guo
3. Junbao Zhuo
4. Huimin Ma
This article has no evaluationsLatest version Jul 30, 2025

Listed in

Abstract

Article activity feed

Related articles

Multi-Scale Feature Fusion for Cross-Modality Person Re-Identification: The MSJLNet Approach

Enhancing Infrared-Visible Image Fusion via Text-Guided Adaptive Feature Integration

Two-Stage Fine-tuning CLIP by Introducing Structure Knowledge for Few-shot Classification