Word meaning, not surface statistics, is essential for predictive language processing
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Humans comprehend language incrementally, updating the representation of sentence meaning with each incoming word. These updates are guided by the distance between each perceived word and prior expectations—the prediction error. The alignment between large language models (LLMs) and cortical activity inspires the hypothesis that the cortical computation of prediction error is Surface-based , driven by statistical patterns of word form co-occurrence. In contrast, psycholinguistic models propose that prediction error computation is Meaning-based , driven by word semantics. We used polysemic words with ambiguous semantics to distinguish these models: ambiguity would introduce uncertainty into meaning representations and hence the prediction error, if Meaning-based , but would not affect the prediction error, if Surface-based . We examined how ambiguity influenced prediction error signatures in self-paced reading times and magnetoencephalographic (MEG) neural responses during sentence processing. While an LLM-based proxy of prediction error robustly predicted reading times and neural responses to unambiguous words, it failed to predict either under ambiguity. That is, prediction error computation was altered by uncertainty in word meaning, which supports the Meaning-based model and corroborates the essential role of word meaning in predictive language processing. Our findings highlight an important limitation of LLMs as in silico models of the human language faculty.