The Evaluation of Generated Responses by ChatGPT to Complex Linguistics Related Questions
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
As a generative AI chatbot which is based on large language models (LLMs), ChatGPT (4.o) offers numerous advantages and promising applications in applied linguistics (Alaqlobi et. al., 2024). However, there are still some uncertainties and concerns regarding its accuracy of responses to some complex questions related to linguistics (Qamar et al., 2024; Nuland, 2024; Dale, 2021). To address this gap, 5 participants as mediators from 5 countries in 3 continents were engaged in this study. Their responsibility was to initially copy a set of predefined questions (syntax, n=1 & semantics, n=1) from the email sent to them and then input them into ChatGPT. The accuracy of responses was measured and checked through the assistance of two senior lecturers in Linguistics at a private college in Athens, Greece. In addition to the evaluation of the questions and responses, the two experts labeled the responses based on three criteria: accurate, partially accurate and inaccurate. The categorized responses were eventually analyzed by a software program. The findings indicated that ChatGPT demonstrated stronger performance in syntax-related questions compared to semantic-related questions. More specifically, its overall performance in responding to more analytical and subject specific questions such as those related to semantics was less than mediocre. This study adds to the persistent debate on the role of AI in linguistics and offers insights into the reliability and accuracy of responses by ChatGPT to complex questions.