Empirical Evaluation of Automatic Speech Act Classification: From Logistic Regression to GPT-4o

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Speech act classification aims to assign communicative functions to utterances, a task that is central to understanding pragmatic aspects of language. Automating this process can substantially reduce the need for manual annotation, enabling large-scale pragmatic analysis. In this study, we evaluate different automatic classification approaches on the SPICE Ireland dataset. This dataset has been pragmatically annotated according to Searle’s taxonomy of illocutionary speech acts. We compare different traditional machine learning models (Logistic Regression, Random Forest, and XGBoost models), fine-tune a RoBERTa language model, and apply a large language model (GPT-4o) through zero- and few-shot prompting. Finally, by fine-tuning GPT-4o, we achieve the best performance with an accuracy of 87% and a macro average F1-score of 67%. Compared to the logistic regression baseline, this corresponds to a normalized gain of 46% in accuracy and 39% in F1-score. A qualitative analysis of disagreement cases further indicates that the fine-tuned GPT-4o often aligns with expert judgments, suggesting that its true performance is likely underestimated by automatic metrics. This highlights the potential of LLM-generated speech act annotations. To facilitate future research, we make our code and fine-tuned RoBERTa model publicly available.

Article activity feed