Enhancing Cross-Modal Retrieval via Label Graph Optimization and Hybrid Loss Functions

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Cross-modal retrieval, particularly image-text matching, is crucial in multimedia analysis and artificial intelligence, with applications in intelligent search and human-computer interaction. Current methods often overlook the rich semantic relationships between labels, leading to limited discriminability. We introduce a Two-Layer Graph Convolutional Network (L2-GCN) to model label correlations and a hybrid loss function, Circle-Soft, to enhance alignment and discriminability. Our approach, evaluated on NUS-WIDE, MIRFlickr, and MS-COCO datasets, achieves state-of-the-art performance, demonstrating its effectiveness and robustness. The source code is accessible via https://github.com/buzzcut619/L2-GCN-CIRCLE-SOFT

Article activity feed