Brain-Guided Convolutional Neural Networks Reveal Task-Specific Representations in Scene Processing
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Scene categorization is the dominant proxy for visual understanding, yet humans can perform a large number of visual tasks within any scene. Consequently, we know little about how different tasks change how the scene is processed, represented, and its features ultimately used. Here, we developed a novel brain-guided convolutional neural network (CNN) where each convolutional layer was separately guided by neural responses taken at different time points while observers performed two different tasks on the same set of images. We then reconstructed each layer’s activation maps via deconvolution to spatially assess how different features were used as a function of task. The brain-guided CNN made use of image features that human observers identified as being crucial to complete each task starting around 244 ms and persisted to 402 ms. Critically, because the same images were used across the two tasks, the CNN could only succeed if the neural data captured task-relevant differences. Our analyses of the activation maps across layers revealed that the brain’s spatiotemporal representation of local image features evolves systematically over time. This underscores how distinct image features emerge at different stages of processing, shaped by the observer’s goals and behavioral context.