Shared texture-like representations, not global form, underlie deep neural network alignment with human visual processing

Jessica Loke
Lynn K.A. Soerensen
Iris I.A. Groen
Natalie Cappaert
H. Steven Scholte

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Deep neural networks (DNNs) are a leading computational framework for understanding neural visual processing. A standard approach for evaluating their similarity to brain function uses DNN activations to predict human neural responses to the same images, yet which visual properties drive this alignment remains unclear. Here, we show that texture-like representations – operationalized as global summaries of local image statistics – largely underlie this alignment. We recorded electroencephalography (EEG) from 57 participants viewing three image types: natural scenes, ‘texture-synthesized’ versions that preserve global summaries of local statistics while disrupting global form, and isolated objects without backgrounds. Representational-similarity analysis showed the strongest DNN-EEG alignment when both systems processed texture-synthesized images. Cross-prediction – using features from one image condition to predict EEG responses to another – showed that features from texture-synthesized images generalized to natural scenes. Crucially, we observed a dissociation between DNN-EEG alignment and decodable object category information: alignment increased for texture-synthesized images even when object information was reduced. Together, our findings identify global summaries of local image statistics as a common currency linking DNNs and human visual processing, clarifying that global form features are not required for high DNN-EEG alignment. Our findings highlight the shared importance of local image statistics in artificial and biological visual systems.

Significance Statement

Deep neural networks (DNNs) accurately predict human neural responses to images, but the image properties driving this alignment remain unclear. We recorded brain activity from people viewing natural photographs of objects, texture-only versions of those photos (which preserved fine details but no recognizable objects), and isolated objects. DNN predictions matched the human brain signals best for texture-only images, despite their lack of semantic information; those same texture-based features also generalized to predicting brain responses to the natural photos. Strikingly, DNN’s ability to predict brain responses was dissociated from decodable object category information present in the brain activity. These findings suggest that broad texture patterns, rather than object shapes, underlie the alignment between DNNs and human vision, challenging shape-centric theories of visual processing.

Version published to 10.1101/2025.08.29.673066 on bioRxiv
Sep 4, 2025

Visual Regularities Underlie Hierarchical Object Representations in the Human Brain and Self-supervised DNN

This article has 7 authors:
1. Qiande Zhao
2. Junhai Xu
3. Deying Li
4. Xia Wu
5. Kang Zhang
6. Congying Chu
7. Lingzhong Fan
This article has no evaluationsLatest version Aug 31, 2025
Investigating action topography in visual cortex and deep artificial neural networks

This article has 4 authors:
1. Davide Cortinovis
2. Nhut Truong
3. Hans Op de Beeck
4. Stefania Bracci
This article has no evaluationsLatest version Aug 7, 2025
Semantic Saliency from Multi-Modal Large Language Model Scene Understanding Maps

This article has 5 authors:
1. Shravan Murlidaran
2. Ziqi Wen
3. Jonathan Skaza
4. William Wang
5. Miguel P Eckstein
This article has no evaluationsLatest version Aug 1, 2025

Listed in

Abstract

Significance Statement

Article activity feed

Related articles

Visual Regularities Underlie Hierarchical Object Representations in the Human Brain and Self-supervised DNN

Investigating action topography in visual cortex and deep artificial neural networks

Semantic Saliency from Multi-Modal Large Language Model Scene Understanding Maps