Synthesizing Human-Like Conversational Search Interactions with Large Language Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Training effective conversational search systems is often hindered by the scarcity of high-quality, labeled conversational data. To address this challenge, we propose LLM-Driven Conversational Search Session Synthesis (LLM-CSSS), a novel generative framework that leverages the power of large language models (LLMs) to synthesize realistic multi-turn conversational search sessions. Our method involves fine-tuning a pre-trained LLM and enabling it to interact with a simulated search environment based on the Amazon Review dataset to generate contextually relevant user utterances and system responses. We conduct comprehensive experiments comparing our approach with several baselines, including a state-of-the-art session data generation method and a random generation strategy. The results demonstrate that conversational search models trained on the synthetic data generated by LLM-CSSS significantly outperform those trained on other data sources, as evidenced by improvements in MAP, NDCG, BLEU, and METEOR scores. Furthermore, human evaluation confirms the superior coherence, relevance, informativeness, and overall quality of the conversations generated by our method. Our work highlights the potential of LLMs for effectively addressing the data scarcity problem in conversational search and paves the way for developing more robust and user-friendly conversational information retrieval systems.