Assessing the potential of LLM-assisted annotation for corpus pragmatics: the case of humor

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Corpus pragmatics faces ongoing challenges in quantitatively studying context-dependent categories like humor, given their subjectivity and the need for costly interrater reliability checks. Recent advances in LLMs offer a potential way to streamline these processes for pragmatic annotation tasks. This paper investigates that potential through an analysis of Italian political discourse on X, focusing on humorous tweets and their discursive functions (Attardo, 2020). We compare the performance of GPT-4o, LLaMA-3.3-70B-Instruct, and a novice annotator against that of an expert annotator. For the detection of humor, both models reached high agreement with the expert annotator (in particular, GPT-4o: Cohen’s k = 0.75; AC1 = 0.87). Instead, agreement dropped for the classification of humor functions (GPT-4o: Cohen’s k = 0.37; AC1 = 0.70). Qualitative results suggest that the models rely heavily on lexical cues rather than demonstrating deeper pragmatic competence. These findings indicate that while LLMs can provide useful assistance in the initial stages of large-scale annotation, they remain limited in capturing the nuanced and context-dependent nature of pragmatic functions.

Article activity feed