Experimental Evaluation of Machine Learning Models for Goal-oriented Customer Service Chatbot with Pipeline Architecture

Nurul Ain Nabilah Mohd Isa
Siti Nuraishah Agos Jawaddi
Azlan Ismail

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

\textcolor{blue}{Integrating machine learning (ML) into customer service chatbots has significantly enhanced their ability to understand and respond to user queries. However, without rigorous evaluation, such systems may yield artificial or inconsistent responses that affect user experience. In this study, we present an experimental evaluation approach tailored for goal-oriented customer service chatbots built using a pipeline architecture, focusing on three key components: Natural Language Understanding (NLU), Dialogue Management (DM), and Natural Language Generation (NLG). The proposed method is model-agnostic and emphasizes component-wise benchmarking through hyperparameter optimization and comparative analysis of candidate models. Specifically, we evaluate BERT and LSTM for the NLU component, DQN and DDQN for DM, and GPT-2 and DialoGPT for NLG. Experiments are conducted using the MultiWOZ dataset, with performance evaluated based on intent accuracy, dialogue success rate, and BLEU, METEOR, and ROUGE scores. Results show that BERT achieves superior intent detection, while LSTM excels in slot filling. DDQN outperforms DQN in task success, dialogue efficiency, and reward accumulation. GPT-2 surpasses DialoGPT in text generation quality. These findings not only highlight the strengths of individual models but also provide a reusable evaluation framework for optimizing chatbot performance across components, offering practical insights for future development in both research and real-world applications.}\thispagestyle{empty}

Version published to 10.21203/rs.3.rs-4532068/v1 on Research Square
Apr 23, 2025

Data2Dialogue: Structured Enterprise Knowledge Grounding in LLM Agents for Personalized Wellness Sales

This article has 1 author:
1. Shijin Wang
This article has no evaluationsLatest version May 26, 2025
LCJM: Joint Modeling of Multi-Intent Spoken Language Understanding with Label Attention and Chunk Partitioning

This article has 5 authors:
1. Ying Lu
2. Yannan Xiao
3. Wenming Huang
4. Zhenrong Deng
5. Sheng Liu
This article has no evaluationsLatest version May 14, 2025
Evaluating ChatGPT’s Semantic Alignment with Community Answers: A Topic-Aware Analysis Using BERTScore and BERTopic

This article has 1 author:
1. Mashael M. Alsulami
This article has no evaluationsLatest version Apr 23, 2025

Listed in

Abstract

Article activity feed

Related articles

Data2Dialogue: Structured Enterprise Knowledge Grounding in LLM Agents for Personalized Wellness Sales

LCJM: Joint Modeling of Multi-Intent Spoken Language Understanding with Label Attention and Chunk Partitioning

Evaluating ChatGPT’s Semantic Alignment with Community Answers: A Topic-Aware Analysis Using BERTScore and BERTopic