A local-global Transformer-based model for Person Re-Identification
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Person re-identification (ReID) aims to recognize a specific individual across various camera views. State-of-the-art methods have shown that both Transformer-based and CNN-based methods deliver competitive performance. However, Transformer-based methods tend to overlook local features, as they primarily process input sequences holistically, rather than focusing on individual elements or small groups within the sequence. To address this limitation, we introduce an innovative Transformer-based person ReID model that effectively integrates local and global features. The Local Attention Module is added to capture fine-grained features, which are then combined with global features to enhance the model's recognition accuracy. Given the importance of positional information in image data, relative position encoding is incorporated within the Local Attention Module. This encoding method better captures the relative positional relationships between different tokens in an image, thereby improving the model's comprehension of the structural information of the image. Experimental results indicate that our model shows enhanced performance on the Market-1501 and DukeMTMC-reID benchmark datasets for person ReID.