Evaluating chatbot authenticity in simulations of spoken interaction: Demonstrating the utility of corpus-based methods for development and validation

Dana Gablasova
Luke William Harding
Vaclav Brezina
Emil T. Hazelhurst
Barry O'Sullivan
Richard Spiby

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study presents a methodological framework for applying corpus linguistics to systematically evaluate the authenticity of chatbot production in relation to (spoken) production in a general target language use domain. We demonstrate the approach through data drawn from an illustrative case study: a low-stakes formative assessment system in which learners interact with a ChatGPT-powered bot. A Chatbot Corpus containing approx. 290,000 words from 600 simulations representing target ChatGPT production was created, representing two GPT versions (3.5 and 4), and three “temperature” settings. This corpus was then compared with relevant subcorpora in the British National Corpus 2014, which contains 100 million words of British English collected in naturalistic settings. Analyses were conducted at macro- (multidimensional analysis), meso- (comparative frequency analysis), and micro-levels (occurrence of specific pragmatic feature analysis). Findings demonstrated that ChatGPT-powered chatbot production was systematically more similar to genres of written rather than spoken communication: output demonstrated higher lexical density and was characterised by a relatively low occurrence of features typical of spoken communication such as stance and pragmatic markers. We argue that the methodological framework is applicable across different chatbot models, allowing researchers and developers to use this approach with newer, more refined AI-powered conversational agents in the future.

Version published to 10.31219/osf.io/k7j4h_v1 on OSF Preprints
Jul 21, 2025

A Systematic Approach to Evaluate the Use of Chatbots in Educational Contexts: Learning Gains, Engagements and Perceptions

This article has 8 authors:
1. Wei Qiu
2. Lin Su Chit
3. Nurabidah Binte Jamil
4. Muang Thway
5. Samuel Ng Soo Hwee
6. Lei Zhang
7. Fun Siong Lim
8. Joel Weijia Lai
This article has no evaluationsLatest version Jul 13, 2025
A Systematic Approach to Evaluate the Use of Chatbots in Educational Contexts: Learning Gains, Engagements and Perceptions

This article has 8 authors:
1. Wei Qiu
2. Lin Su Chit
3. Nurabidah Binte Jamil
4. Muang Thway
5. Samuel Ng Soo Hwee
6. Lei Zhang
7. Fun Siong Lim
8. Joel Weijia Lai
This article has no evaluationsLatest version Jul 13, 2025
The Evaluation of Generated Responses by ChatGPT to Complex Linguistics Related Questions

This article has 1 author:
1. Hadis Habibi
This article has no evaluationsLatest version May 28, 2025

Listed in

Abstract

Article activity feed

Related articles

A Systematic Approach to Evaluate the Use of Chatbots in Educational Contexts: Learning Gains, Engagements and Perceptions

A Systematic Approach to Evaluate the Use of Chatbots in Educational Contexts: Learning Gains, Engagements and Perceptions

The Evaluation of Generated Responses by ChatGPT to Complex Linguistics Related Questions