Image-grounded encoding models reveal distinct temporal profiles of naturalistic object and scene processing in the human brain

Niklas Müller
H. Steven Scholte
Iris I. A. Groen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The human brain processes large amounts of complex visual information in order to effectively interact with its environment. It is well established that our visual system has specialized regions to process incoming information efficiently, such as scene-, face-, and object-selective areas, which can be uncovered using functional magnetic resonance imaging (fMRI). However, mapping of specialized visual processing is typically done with experimental stimuli in which visual information is artificially separated (e.g. scene backgrounds vs. isolated object cutouts), and fMRI signals are insensitive to potential fine-grained temporal differences in visual information processing. Here, we identify temporal signatures of neural object and scene processing in real-world visual environments by building image-grounded brain-predictive encoding models of human electroencephalography (EEG) responses. In a large set of high-resolution natural images, we separate object from scene information on a per-image basis, and then feed this information to separate deep neural network-based encoding models to predict EEG responses to intact natural images. We find that encoding models that receive only object information consistently exhibit a delayed temporal encoding profile compared to models that only receive scene information. Control analyses confirm the robustness of this delayed object encoding, showing that consistent selection of object or scene information is needed to achieve high encoding performance. Using these distinct encoding profiles as templates, we identify the typicality of individual object classes and scene elements and determine how they are represented in human EEG recordings. Overall, our results show that temporally-resolved recording during intact natural image viewing allows us to delineate distinct temporal profiles of the processing of specific visual elements in complex real-world environments. These findings demonstrate that image-grounded encoding models are a powerful tool for isolating components of naturalistic perception.

Significance

The neural processes underlying visual perception are typically studied using carefully designed experiments that artificially separate different kinds of information, for example showing either isolated object cutouts or 3D scene backgrounds, in separate stimulus conditions, to identify specific neural pathways for object vs. scene perception. However, since different types of visual information normally co-occur in the world — objects are typically embedded within scenes — this strict experimental control results in a trade-off with ecological validity. Here, we resolve this trade-off by using image-grounded neural encoding models on real-world images. We combine the richness and ecological validity of natural images that depict multiple objects in everyday scene backgrounds, with the precision of image-computable encoding models in which we separate object and scene information at the pixel level in a controlled manner. By comparing multiple encoding models that each receive distinct input elements on their ability to predict the EEG responses from human participants viewing the full, intact scenes, we show that object and scene information are distinctly processed, consistent with a coarse-to-fine hierarchy in which scene processing precedes object processing. Further, we elucidate which individual parts of the scene, such as buildings, trees or the ground, are prominently encoded in EEG signals, revealing which elements of visual environments drive neural responses to intact, complex, real-world images. This approach opens exciting new possibilities to study human visual perception under more natural, real-world conditions.

Version published to 10.64898/2026.04.24.720581 on bioRxiv
Apr 24, 2026

Diffusion-based stimulus optimization reveals functional organization across higher visual cortex

This article has 5 authors:
1. Margaret M. Henderson
2. Andrew F. Luo
3. Sungjoon Park
4. Michael J. Tarr
5. Leila Wehbe
This article has no evaluationsLatest version May 15, 2026
From Coarse to Rich: Successive Waves of Visual Perception in Prefrontal Cortex

This article has 6 authors:
1. Joachim Bellet
2. Markus Siegel
3. Stanislas Dehaene
4. Bechir Jarraya
5. Theofanis Panagiotaropoulos
6. Timo van Kerkoerle
This article has no evaluationsLatest version Mar 28, 2026
Active vision is linked to category selectivity in the individual brain

This article has 4 authors:
1. Diana Kollenda
2. Elaheh Akbarifathkouhi
3. Maximilian Davide Broda
4. Benjamin de Haas
This article has no evaluationsLatest version Apr 16, 2026

Discuss this preprint

Listed in

Abstract

Significance

Article activity feed

Related articles

Diffusion-based stimulus optimization reveals functional organization across higher visual cortex

From Coarse to Rich: Successive Waves of Visual Perception in Prefrontal Cortex

Active vision is linked to category selectivity in the individual brain