Predictive vision-language integration in the human visual cortex
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Integrating linguistic and visual information is a core function of human cognition, yet how information from these two modalities interacts in the brain remains largely unknown. Competing frameworks, including the hub-and-spoke model and Bayesian theories such as predictive coding, offer conflicting accounts of how the brain achieves multimodal integration. To address this question, we collected a large-scale fMRI dataset and leveraged state-of-the-art AI systems to construct encoding models that probe how the human brain matches and integrates linguistic and visual information. We found that prior information from one modality can modulate neural responses in another, even in the early visual cortex (EVC). Integration neural response in EVC is governed by prediction errors consistent with predictive coding theory. Enhanced and suppressed neural responses to semantically matched cross-modal stimuli were found in distinct EVC populations, with suppression population carrying denser, behaviorally relevant semantic information. Both populations support semantic integration with distinct temporal dynamics and representational structures. These findings provide representational- and computational-level insights into how the brain integrates information across modalities, revealing unified principles of information processing that link biological and artificial intelligence.