Word predictability in Portuguese: Cloze norming study vs. LLMs
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
With the rise of large language models (LLM), there has been deemed a possible alternative to human participants in many scientific domains, including linguistic studies, the cloze study. Cloze probability is used to inform researchers as to how predictable a word is within a certain sentential context. It is a common tool in linguistic studies to understand language production and processing. Several studies (e.g., Jacobs et al., 2022; Lopes Rego et al., 2024) have compared LLM performance with traditional cloze studies and their results are promising. Nonetheless, these studies were done in English. Hence, we would like to know LLM performance in the Portuguese language. Here, we conducted correlation analyses between a traditional cloze study and two LLM, such as: Grevásio (Santos et al., 2024) and Tucano (Corrêa et al., 2024). The results show a moderate and weak correlation between the cloze probability from human participants and the LLMs. Therefore, LLM still needs to be improved to reach human-level performance.