Enhancing Cross-Modal Retrieval via Label Graph Optimization and Hybrid Loss Functions
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Cross-modal retrieval, particularly image-text matching, is crucial in multimedia analysis and artificial intelligence, with applications in intelligent search and human-computer interaction. Current methods often overlook the rich semantic relationships between labels, leading to limited discriminability. We introduce a Two-Layer Graph Convolutional Network (L2-GCN) to model label correlations and a hybrid loss function, Circle-Soft, to enhance alignment and discriminability. Our approach, evaluated on NUS-WIDE, MIRFlickr, and MS-COCO datasets, achieves state-of-the-art performance, demonstrating its effectiveness and robustness. The source code is accessible via https://github.com/buzzcut619/L2-GCN-CIRCLE-SOFT