Sensory context as a universal principle of language in humans and LLMs
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Language is a fundamental human capacity. Large language models (LLMs) have presented the first viable model of language outside of humans, yet how these models learn and use language differs significantly from humans. Here, we compare LLMs and humans processing language with varying levels of sensory information – from disembodied written text to audiovisual videos of speakers – to demonstrate that, in both humans and LLMs, sensory context is critical for optimal language processing. We asked human participants to predict upcoming words within narratives presented as either audiovisual, audio-only, or written language (N = 1500 total; 500 per modality). We compared these predictions across modalities as well as to predictions generated by LLMs. Human predictions were overall more accurate than LLMs, regardless of the modality of language processing. Compared to written language, both audiovisual and audio-only language increased the accuracy and consensus of human predictions and decreased alignment with LLMs. We identified that prosody, a central feature of the auditory signal, partially drove both the observed advantage in accuracy within humans and divergence from LLMs. Integrating multimodal information (e.g., prosody, auditory, or audiovisual information) with the representation of language learned during LLM training improved models’ next-word prediction performance and increased the efficiency of language learning. These findings demonstrate that sensory contexts are foundational to human-like language behavior, and that these contexts can enrich and accelerate language acquisition within LLMs similar to what is observed within human development.