The Evaluation of Generated Responses by ChatGPT to Complex Linguistics Related Questions

Hadis Habibi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

As a generative AI chatbot which is based on large language models (LLMs), ChatGPT (4.o) offers numerous advantages and promising applications in applied linguistics (Alaqlobi et. al., 2024). However, there are still some uncertainties and concerns regarding its accuracy of responses to some complex questions related to linguistics (Qamar et al., 2024; Nuland, 2024; Dale, 2021). To address this gap, 5 participants as mediators from 5 countries in 3 continents were engaged in this study. Their responsibility was to initially copy a set of predefined questions (syntax, n=1 & semantics, n=1) from the email sent to them and then input them into ChatGPT. The accuracy of responses was measured and checked through the assistance of two senior lecturers in Linguistics at a private college in Athens, Greece. In addition to the evaluation of the questions and responses, the two experts labeled the responses based on three criteria: accurate, partially accurate and inaccurate. The categorized responses were eventually analyzed by a software program. The findings indicated that ChatGPT demonstrated stronger performance in syntax-related questions compared to semantic-related questions. More specifically, its overall performance in responding to more analytical and subject specific questions such as those related to semantics was less than mediocre. This study adds to the persistent debate on the role of AI in linguistics and offers insights into the reliability and accuracy of responses by ChatGPT to complex questions.

Version published to 10.31219/osf.io/9h6z7_v1 on OSF Preprints
May 28, 2025

The Evaluation of Generated Responses by ChatGPT to Complex Linguistics Related Questions

This article has 1 author:
1. Hadis Habibi
This article has no evaluationsLatest version Jun 3, 2025
Exploring the Role of Translation Brief Elements in Prompts to Large Language Models

This article has 1 author:
1. Hala Sharkas
This article has no evaluationsLatest version Jun 5, 2025
Word predictability in Portuguese: Cloze norming study vs. LLMs

This article has 1 author:
1. Jane Aristia
This article has no evaluationsLatest version Jun 2, 2025

Listed in

Abstract

Article activity feed

Related articles

The Evaluation of Generated Responses by ChatGPT to Complex Linguistics Related Questions

Exploring the Role of Translation Brief Elements in Prompts to Large Language Models

Word predictability in Portuguese: Cloze norming study vs. LLMs