Morphemes in the wild: Modelling affix learning from the noisy landscape of natural text

Maria Korochkina
Marco Marelli
Kathleen Rastle

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Morphological knowledge serves as a powerful heuristic for vocabulary growth and contributes significantly to the speed and efficiency of reading. While research has long sought to explain how the knowledge of derivational morphology is acquired, previous approaches have struggled to capture the nuanced and complex ways in which derivational morphemes are used in written language, particularly that these morphemes contribute to meaning in a graded manner and that noise introduced by misleading forms (e.g., deliver) can impede learning. Our approach builds on earlier insights but moves beyond them by combining a large-scale analysis of vocabulary used in 1,200 popular books with computational modelling to explore how learning of derivational affixes may occur from text containing naturally occurring noise. We use a compositional distributional semantic model to investigate what can be learned about the meanings of individual English prefixes and suffixes through reading and evaluate the model’s performance against data from 120 adults in a lexical processing task. Our findings demonstrate that, despite the presence of noise, natural text contains sufficient structure to support the extraction of core affix semantics, and that readers are attuned to the complex patterns that shape affix use in the wild. This work contributes a new dimension to a more principled and psychologically grounded account of morpheme learning, and we discuss both this contribution and the broader insights it offers for language research.

Version published to 10.31234/osf.io/yzcqm_v2 on OSF Preprints
Dec 12, 2025
Version published to 10.31234/osf.io/yzcqm_v1 on OSF Preprints
Aug 1, 2025

Factors Influencing L2 Learners’ Use of the English Dative Construction: Insights from a Learner Corpus

This article has 4 authors:
1. Junya Fukuta
2. Akira Murakami
3. Masato Terai
4. Yu Tamura
This article has no evaluationsLatest version Dec 26, 2025
Temporality in the Arabic Lexicon: Morphological Encoding of Experiential States in the faʕla:n Adjectival Template

This article has 3 authors:
1. Marwan Jarrah
2. Shahd Dibas
3. Basem Al-Raba’a
This article has no evaluationsLatest version Jan 9, 2026
Substitute-Space Embeddings for Label-Free Syntax: Unsupervised AI for POS Discovery

This article has 1 author:
1. Vipul Razdan
This article has no evaluationsLatest version Jan 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Factors Influencing L2 Learners’ Use of the English Dative Construction: Insights from a Learner Corpus

Temporality in the Arabic Lexicon: Morphological Encoding of Experiential States in the faʕla:n Adjectival Template

Substitute-Space Embeddings for Label-Free Syntax: Unsupervised AI for POS Discovery