Bridging Behavioral Gaps: Automatic Extrapolation of Concreteness Norms for Arabic and English with a k-Nearest Neighbor Approach

Marina Aziz
Urban Knupleš
Diego Frassinelli
Sabine Schulte im Walde

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This article addresses the automatic extrapolation of concreteness norms for nouns in English and Modern Standard Arabic, a low-resource language with different cultural characteristics. The main goal of this study is to develop computational methods for reliably estimating the degree of word concreteness (e.g. apple vs. justice), supporting applications in psycholinguistics and natural language processing. To this end, we introduce a novel dataset of 202 Arabic nouns, rated for concreteness by humans and aligned with established English norms. Using both English and Arabic resources as gold-standards, we compare the output of k-Nearest Neighbors (KNN) regression models built on FastText and transformer-based embeddings with ratings generated by ChatGPT. The KNN models highly correlate with human ratings for both English and Arabic, whereas ChatGPT, while consistent across runs, yield lower correlations. These results clearly show that KNN is the most accurate method for concreteness estimation in low resources settings. We conclude with quantitative and qualitative analyses of systematic patterns in model behavior and cross-cultural differences in concreteness norms.

Version published to 10.21203/rs.3.rs-7858980/v1 on Research Square
Nov 2, 2025

AI-generated estimates of Dutch words and expressions

This article has 5 authors:
1. Marc Brysbaert
2. Javier Conde
3. Juan Haro
4. Carlos Arriaga
5. Pedro Reviriego
This article has no evaluationsLatest version Mar 19, 2026
Cognitive Limits and Focused Grammatical Structures in English and Arabic

This article has 1 author:
1. Amjed Bashar
This article has no evaluationsLatest version Mar 8, 2026
A diagnostic and evaluative analysis of PARSEME corpora complexity

This article has 3 authors:
1. Santiago Fernández Lanza
2. Víctor Manuel Darriba Bilbao
3. Daniel Fernández-González
This article has no evaluationsLatest version Mar 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

AI-generated estimates of Dutch words and expressions

Cognitive Limits and Focused Grammatical Structures in English and Arabic

A diagnostic and evaluative analysis of PARSEME corpora complexity