Crowdsourced and AI-generated Age of Acquisition (AoA) Norms for Vocabulary in Print: Extending the Kuperman et al. (2012) norms
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper revisits the Age of Acquisition (AoA) norms of Kuperman et al. (2012). Three studies were conducted. Study 1 reports a crowdsourcing 'megastudy' obtaining 790,024 estimates from participants providing the age they could first read and write 11,074 early acquired words from Kuperman et al. (2012). The study aimed to differentiate between oral language receptive AoA and print AoA. The results correlate well with the original estimates and offer slightly higher AoA values for print knowledge. They are released as useful supplements to the original norms. Study 2 explored the potential of Large Language Models (LLMs), specifically GPT-4o, to replicate these new crowdsourced AoA norms. The findings indicated a strong correlation between AI-generated estimates and human judgments, showing the utility of AI in generating AoA estimates. The results confirm that AI is a valuable resource of norms for psycholinguistic and educational research, of particular value for under-resourced languages and researchers with limited resources. Based on the successful application of AI in Study 2, Study 3 extended the method to the entire set of words in the English Crowdsourcing Project (ECP), producing AI-generated AoA estimates for approximately 62,000 English words. This provides a substantial database of AoA norms that are found to correlate very highly with human-generated estimates (r =.86) and perform well in accounting for word processing times in regression analyses. The AI generated results have some important limitations, including overestimating the vocabulary acquired at some ages. All resources are available in the Open Science Framework for further exploration.