A Context-Aware Hybrid Search Framework Integrating LLM Tagging and GPU Acceleration for Enhanced E-Commerce Product Discovery

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Traditional keyword-based retail search engines typically struggle to deliver relevant results owing to their reliance on exact matches, which can limit user experience and product discovery. As consumer demands grow, search performance optimization, particularly, throughput and latency, has become crucial. To address such challenges, this study presents a novel hybrid search framework centered around a context-aware large language model (LLM) tag generation mechanism tailored to traditional Chinese and specific market nuances (e.g., Taiwanese brands/trends). The core component is integrated with dense embedding and reranker models, and the entire system leverages GPU-accelerated technologies, such as RAPIDS cuDF for efficient large-scale data handling and the NVIDIA Triton Inference Server for optimized real-time inference, including dense embedding caching and dynamic batching.Results demonstrate the relevance and performance efficiency of the framework. The incorporation of context-aware LLM tags can dramatically improve the search relevance, that is, it can increase the intent-aligned conversion rate from 5.09–98.16% and enable the retrieval of all the relevant items in the specific tests in which a keyword search failed. Moreover, the performance optimization yields substantial gains: RAPIDS Dask-cuDF reduces the data-processing latency by 85.5% compared with CPU-based Pandas, Triton Inference Server improves the model serving throughput by nearly 800% and reduces the latency by 97% versus baseline CUDA execution, Redis caching drastically shortens the cached embedding retrieval time, and the LLM component achieves a 178.33 tokens/sec throughput (benchmarked on the Llama-3.1-8B via NIMS).The optimized search framework is successfully deployed on the 711go e-commerce platform. The framework deployment results in a 50% increase in the customer dwell time and a 40% increase in sales over the 90-day verification period, which confirm the ability of the system to enhance consumer browsing experience considerably and deliver tangible business value through improved search functionality.

Article activity feed