Monkey See, Model Knew: Large Language Models Accurately Predict Visual Brain Responses in Humans and Non-Human Primates

Colin Conwell
Emalie MacMahon
Akshay Jagadeesh
Kasper Vinken
Saloni Sharma
Jacob S. Prince
George A. Alvarez
Talia Konkle
Margaret Livingstone
Leyla Isik

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Recent progress in multimodal AI and ‘language-aligned’ visual representation learning has rekindled debates about the role of language in shaping the human visual system. In particular, the emergent ability of ‘language-aligned’ vision models (e.g. CLIP) – and even pure language models (e.g. BERT) – to predict image-evoked brain activity has led some to suggest that human visual cortex itself may be ‘language-aligned’ in comparable ways. But what would we make of this claim if the same procedures could model visual activity in a species that does not have language? Here, we conducted controlled comparisons of pure-vision, pure-language, and multimodal vision-language models in their prediction of human (N=4) and rhesus macaque (N=6, 5:IT, 1:V1) ventral visual activity to the same set of 1000 captioned natural images (the ‘NSD1000’). The results revealed markedly similar patterns in model predictivity of early and late ventral visual cortex across both species. This suggests that language model predictivity of the human visual system is not necessarily due to the evolution or learning of language per se , but rather to the statistical structure of the visual world that is reflected in natural language.

Version published to 10.1101/2025.03.05.641284 on bioRxiv
Mar 10, 2025

A Psychological-Scaling Approach to Unraveling the Nature of Pigeons’ Categorization of Natural Visual Objects

This article has 3 authors:
1. Odysseus Raymond Primitivo Orr
2. Edward A. Wasserman
3. Robert Nosofsky
This article has no evaluationsLatest version Dec 19, 2025
Cognitive maps in the prefrontal cortex

This article has 4 authors:
1. Sebastijan Veselic
2. Elena Gutierrez
3. Mohamady El-Gaby
4. Mathias Sablé-Meyer
This article has no evaluationsLatest version Jan 2, 2026
The Semantic Scaffold: Functional Dissociation of Visual and Language-derived Features Shapes Human Natural Scene Understanding

This article has 9 authors:
1. Yu Zhang
2. Yuxuan Tu
3. Zihan Yin
4. Jing Zhang
5. Weiyang Shi
6. Siyang Li
7. Jingguo Dai
8. Yongfu Hao
9. Tianzi Jiang
This article has no evaluationsLatest version Jan 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Psychological-Scaling Approach to Unraveling the Nature of Pigeons’ Categorization of Natural Visual Objects

Cognitive maps in the prefrontal cortex

The Semantic Scaffold: Functional Dissociation of Visual and Language-derived Features Shapes Human Natural Scene Understanding