Research on Multimodal Retrieval System of e-Commerce Platform Based on Pre-Training Model
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In this paper, a multi-modal retrieval system for e-commerce platform is proposed, which integrates three advanced pre-training models: BLIP, CLIP and CLIP Interrogator. The system solves the challenge of traditional keyword-based product search by realizing more accurate and efficient graphic matching.We trained and evaluated our approach using 413,000 image-text pairs from the Google conceptual Captions dataset. Our method introduces a novel feature fusion mechanism and combines the advantages of several pre-trained models to realize comprehensive visual semantic understanding. The system shows strong performance in daily business scenes and complex artistic product description. Experimental results show that our proposed method can effectively generate detailed and context-aware descriptions and accurately match user queries and product pictures. The adaptability and semantic understanding of the system make it of special value in improving the user experience of e-commerce applications. This research has contributed to the development of intelligent shopping platform by bridging the gap between text query and visual content. It is worth emphasizing that the integration of the CLIP model significantly enhances the e-commerce retrieval system’s understanding of user intent and product semantics, thereby making product recommendations more accurate and the search process more targeted.