Temporal misalignment in scene perception: Divergent representations of locomotive action affordances in human brain responses and DNNs
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The human visual system processes scenes with remarkable speed, enabling the extraction of essential information to navigate our surroundings in a single glance. To elucidate how the brain transforms visual inputs into neural representations of navigationally relevant information, we collected electroencephalography (EEG) responses to diverse indoor and outdoor scenes along with behavioral annotations of locomotive action affordances (e.g., walking, cycling), object annotations, and low-level image features to model distinct types of scene information. Using representational similarity analysis, we examined the neural representation of locomotive action affordances over time, their co-localization within scene-selective cortex, and their computational alignment with deep neural networks (DNNs). Our results show that locomotive action affordance representations emerge within 200 ms of visual processing, showing unique contributions to EEG responses at temporally distinct time-points from objects and low-level properties. Spatiotemporal fusion with functional magnetic resonance imaging (fMRI) recordings in scene-selective brain regions reveals that both the parahippocampal (PPA) and occipital place region (OPA), but not the medial place region (MPA), contribute to locomotive action affordance representations, with a distinct temporal hierarchy between them. While DNNs exhibit good predictivity of early EEG responses, they primarily capture low-level features and show limited alignment with affordance processing. These findings reveal a temporally distinct neural representation of action affordances and highlight a limitation of current DNNs in modeling affordance perception.