Bridging Behavioral Gaps: Automatic Extrapolation of Concreteness Norms for Arabic and English with a k-Nearest Neighbor Approach
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This article addresses the automatic extrapolation of concreteness norms for nouns in English and Modern Standard Arabic, a low-resource language with different cultural characteristics. The main goal of this study is to develop computational methods for reliably estimating the degree of word concreteness (e.g. apple vs. justice), supporting applications in psycholinguistics and natural language processing. To this end, we introduce a novel dataset of 202 Arabic nouns, rated for concreteness by humans and aligned with established English norms. Using both English and Arabic resources as gold-standards, we compare the output of k-Nearest Neighbors (KNN) regression models built on FastText and transformer-based embeddings with ratings generated by ChatGPT. The KNN models highly correlate with human ratings for both English and Arabic, whereas ChatGPT, while consistent across runs, yield lower correlations. These results clearly show that KNN is the most accurate method for concreteness estimation in low resources settings. We conclude with quantitative and qualitative analyses of systematic patterns in model behavior and cross-cultural differences in concreteness norms.