BLiMP-NL: A corpus of Dutch minimal pairs and acceptability judgements for language model evaluation

Stefan L. Frank
Michelle Suijkerbuijk
Zoë Prins
Marianne de Heer Kloots
Jelle Zuidema

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We present a corpus of 8400 Dutch sentence pairs, intended primarily for the grammatical evaluation of language models. Each pair consists of a grammatical sentence and a minimally different ungrammatical sentence. The corpus covers 84 paradigms, classified into 22 syntactic phenomena. Ten sentence pairs of each paradigm were created by hand, while the remaining ninety were generated semi-automatically and manually validated afterwards. Nine of the ten hand-crafted sentences of each paradigm are rated for acceptability by at least 30 participants each, and for the same 9 sentences reading times are recorded per word, through self-paced reading. Here, we report on the construction of the dataset, the measured acceptability ratings and reading times, as well as the extent to which a variety of language models can be used to predict both the ground-truth grammaticality and human acceptability ratings.

Version published to 10.31234/osf.io/mhjbx_v2 on OSF Preprints
Mar 10, 2025
Version published to 10.31234/osf.io/mhjbx_v1 on OSF Preprints
Apr 15, 2024

Human and LLM accent rating of English-L2 speech by Brazilian speakers

This article has 2 authors:
1. Felipe Flores Kupske
2. Laura Zorzi
This article has no evaluationsLatest version Dec 14, 2025
Cross-linguistic zero-shot communication via ad-hoc pseudowords

This article has 3 authors:
1. Fritz Guenther
2. Aliona Petrenco
3. Daniele Gatti
This article has no evaluationsLatest version Jan 5, 2026
Learning from the input: a corpus-based investigation of Chinese classifiers in children’s books and child-directed speech

This article has 5 authors:
1. Jinyu Shi
2. Yaling Hsiao
3. Yifan Yang
4. Elizabeth Wonnacott
5. Kate Nation
This article has no evaluationsLatest version Feb 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Human and LLM accent rating of English-L2 speech by Brazilian speakers

Cross-linguistic zero-shot communication via ad-hoc pseudowords

Learning from the input: a corpus-based investigation of Chinese classifiers in children’s books and child-directed speech