KM-Chat: A Large-Scale Synthetic Question-Answer Dataset for Open-Domain Conversational AI

Tarek Barhoum
Mina Ibrahim
Karam Al Ghazi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Recent advancements in large language models (LLMs) have significantly transformed natural language processing, particularly in the development of conversational agents. Despite these advancements, the creation of robust dialogue systems remains constrained by the limited availability of large-scale, high-quality conversational datasets. To address this gap, this study introduces KM-Chat, a comprehensive synthetic question–answer dataset specifically designed for open-domain conversational AI research. The dataset consists of 250,003 Q&A pairs, systematically generated using state-of-the-art LLMs through a multi-stage pipeline incorporating controlled sampling techniques, iterative batch generation, and rigorous post-processing. KM-Chat covers a wide range of conversational contexts, including both general-purpose and technical domains, thereby enhancing contextual diversity and adaptability. By ensuring scalability, linguistic variety, and structural consistency, KM-Chat provides an essential resource for training and evaluating dialogue systems, fostering advancements in next-generation human-like conversational models.

Version published to 10.21203/rs.3.rs-7294671/v1 on Research Square
Aug 6, 2025

Conversations From Make-Believe: An Attentive Encoder–Decoder Chatbot Trained on Scripted Dialogue

This article has 1 author:
1. Sourabh Subhash Rajput
This article has no evaluationsLatest version Jan 29, 2026
BHRE-RAG: A Benchmark and Retrieval-Augmented Framework for Advancing Comprehension-Based Question Answering in Bangla

This article has 2 authors:
1. Md Saiyem Raiyan
2. Nayeema Ferdous
This article has no evaluationsLatest version Jan 23, 2026
Efficient and Responsible Transformer Based Conversational Agents for Emotionally Supportive Dialogue

This article has 8 authors:
1. DIVYA SALEELA
2. Akhil Mathew Philip
3. Reji R
4. Rincy Merlin Mathew
5. Teena Joseph
6. Sujith Kumar P S
7. Supriya L P
8. Chinchu M S
This article has no evaluationsLatest version Feb 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Conversations From Make-Believe: An Attentive Encoder–Decoder Chatbot Trained on Scripted Dialogue

BHRE-RAG: A Benchmark and Retrieval-Augmented Framework for Advancing Comprehension-Based Question Answering in Bangla

Efficient and Responsible Transformer Based Conversational Agents for Emotionally Supportive Dialogue