Measuring children’s early vocabulary in low-resource languages using a Swadesh-style word list

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Early language skill is predictive of many later life outcomes, and is thus of great interest to developmental psychologists and clinicians. The MacArthur-Bates Communicative Development Inventories (CDIs), parent report instruments typically containing inventories of hundreds of children's vocabulary words, have proven to be valid and reliable instruments for measuring children's early language skill. The CDIs have been adapted to many dozens of languages, and cross-linguistic comparisons show both consistency and variability in language acquisition trajectories. However, thousands of languages do not yet have CDIs, nor the early language corpora needed to create them, posing a significant barrier to increasing the diversity of languages that are studied. Here, we propose a method for selecting candidate words to include on new CDIs through analyzing psychometric properties of the translation-equivalent concepts that are frequently included on existing CDIs. Leveraging 32 datasets from existing CDIs, we propose a list of 100 concepts that have low variability in their cross-linguistic learning difficulty. This pool of common concepts—analogous to the Swadesh lists, which are basic vocabulary lists used in glottochronology for cross-language comparison—can be used as a starting point for future CDI adaptations. We show that the proposed Swadesh-CDI list generalizes well to data from 10 additional languages.

Article activity feed