A Systematic Evaluation of Dutch Large Language Models’ Surprisal Estimates in Sentence, Paragraph, and Book Reading

Sam Boeve
Louisa Bogaerts

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Studies using computational estimates of word predictability from neural language models have garnered strong evidence in favour of surprisal theory. Upon encountering a word, readers experience a processing difficulty that is a linear function of that word’s surprisal. Evidence for this effect has been established in the English language or using multilingual models to estimate surprisal across languages. At the same time, many language-specific models of unknown psychometric quality are made openly available. Here, we provide a systematic evaluation of the surprisal estimates of a collection of large language models, specifically designed for Dutch, examining how well they account for reading times. We compare their performance to a multilingual model (mGPT) and an N-gram model. Across three eye-tracking corpora, a Dutch model predicted reading times better than the multilingual model. Dutch large language models replicate the general inverse scaling trend observed for English, with the surprisal estimates of smaller models showing a better fit to reading times than those of the largest models, however, this effect depends partly on the corpus used to evaluate the model. Surprisingly, in contrast to the linear effect of surprisal on reading times observed in other corpora, for the GECO corpus a non-linear link fitted the data best. Overall, these results offer a psychometric leaderboard of Dutch large language models and challenge the notion of a ubiquitous linear effect of surprisal. The complete set of surprisal estimates derived from all neural language models across the three corpora, along with the code to extract the surprisal, is made publicly available (https://osf.io/wr4qf/).

Version published to 10.31219/osf.io/vqnw6_v2 on OSF Preprints
Jun 23, 2025
Version published to 10.31219/osf.io/vqnw6 on OSF Preprints
Dec 20, 2024

Contextual Assembly of Lexical Functions in Large Language Models

This article has 1 author:
1. Chris Kello
This article has no evaluationsLatest version Jul 4, 2025
Trading Accuracy for Fluency? An investigation of word retrieval difficulties in connected speech

This article has 2 authors:
1. Amber Römkens
2. Aurélie Pistono
This article has no evaluationsLatest version Jul 15, 2025
The role of cognateness in native spoken word recognition

This article has 4 authors:
1. Gonzalo Garcia-Castro
2. Serene Siow
3. Kim Plunkett
4. Nuria Sebastian-Galles
This article has no evaluationsLatest version Jul 28, 2025

Listed in

Abstract

Article activity feed

Related articles

Contextual Assembly of Lexical Functions in Large Language Models

Trading Accuracy for Fluency? An investigation of word retrieval difficulties in connected speech

The role of cognateness in native spoken word recognition