Beyond next-word prediction: hierarchical linguistic composition modulates LLM-brain alignment in time

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The internal representations of large language models (LLMs) correlate, or “align”, with human neural activity during language comprehension. One view holds that this alignment reflects shared sensitivity to statistical patterns in LLMs and humans, while others hold that it reflects, at least in part, the emergence of shared linguistic representations in these systems. Here, we investigate whether hierarchical linguistic composition, a property believed to be fundamental to human language, modulates LLM-brain alignment. To this end, we manipulated syntax, compositional semantics, and associative semantics in English sentences that were presented to both an LLM and human participants during an electroencephalography (EEG) experiment. We matched linguistically manipulated stimuli in predictability, which allows us to tease apart alignment induced by linguistic structure from statistical factors. By comparing LLM-EEG alignment scores that were derived using a linear encoding model across predictability-matched conditions, we evaluate how linguistic manipulations modulate the alignment between human EEG reading data and contextual embeddings extracted word-by-word from the hidden layers of GPT2-XL. Three key patterns emerge: (1) increased alignment for word sequences with syntactic structure, (2) decreased alignment for sentences with compositional semantics, and (3) associative semantics does not modulate alignment. These observed linguistic modulations of LLM-EEG alignment take place above and beyond predictability. Our results indicate that associative semantics is encoded similarly by LLMs and the brain, as are at least some aspects of syntactic structure, while compositional semantics is more uniquely encoded in the human brain.

Article activity feed