AI-generated estimates of Dutch words and expressions

Marc Brysbaert
Javier Conde
Juan Haro
Carlos Arriaga
Pedro Reviriego

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study introduces and validates GPT_FAM, an AI-generated resource of familiarity estimates for 935,000 Dutch words and 201,000 multiword expressions. Based on previous studies, we hypothesized that such estimates, particularly when fine-tuned using a few thousand human ratings, would offer a useful, scalable measure of verbal knowledge. The results confirmed the expectation, showing that fine-tuned GPT estimates correlate well with word prevalence, reflecting the likelihood of word recognition. Equally importantly, GPT_FAM estimates significantly predict response latencies in lexical decision tasks, emerging as the most robust predictor in a random forest analysis alongside word frequency and length. The measure may be especially useful for assessing the difficulty of morphologically complex items, such as inflected word forms and transparent compounds, where traditional frequency metrics tend to be ineffective. Both untuned and fine-tuned estimates are freely available for research and educational purposes.

Version published to 10.21203/rs.3.rs-8058427/v1 on Research Square
Mar 19, 2026

Leveraging AI for Automatic Item Generation for Psychological Scales

This article has 3 authors:
1. Xijuan Zhang
2. Kentaro Suzuki
3. Kai Wen Zhou
This article has no evaluationsLatest version Apr 7, 2026
Rote Memorization or Intelligence: An Assessment of Inferential Reasoning in Large Language Models

This article has 3 authors:
1. Rashid Mehmood
2. Eid Rehman
3. Muhammad Habib
This article has no evaluationsLatest version Apr 1, 2026
L1‑Driven Interference Patterns in EFL Academic Writing: Evidence from Azerbaijani–English Contact

This article has 1 author:
1. Yusif Suleymanov
This article has no evaluationsLatest version Apr 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Leveraging AI for Automatic Item Generation for Psychological Scales

Rote Memorization or Intelligence: An Assessment of Inferential Reasoning in Large Language Models

L1‑Driven Interference Patterns in EFL Academic Writing: Evidence from Azerbaijani–English Contact