Spatial sampling of deep neural network features improves encoding models of foveal and peripheral visual processing in humans
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deep Neural Networks (DNNs) are increasingly being used to build encoding models to predict neural data recorded during visual stimulation. Their ability to process natural images makes them a prime candidate for studying the neural profile underlying real-world visual perception. However, there are still prominent discrepancies in how DNNs and humans process visual information. One discrepancy lies in the spatial sampling of visual input: while DNNs uniformly sample from their input at every spatial location, the human visual system samples differentially from central and peripheral regions. Here, we implement multiple spatial sampling strategies on feature maps of DNNs into encoding models that predict human EEG responses to a novel stimulus set consisting of large, high-quality natural scene images. By applying differential spatial sampling of DNN feature maps, we reveal distinct temporal profiles for encoding of peripheral vs. central information in EEG signals. Moreover, we show that a differential spatial sampling derived from the density of retinal ganglion cells yields the best performing encoding model when using DNN feature maps. We experimentally confirm this pattern by separately stimulating peripheral and central visual field regions, and demonstrate that the distinct temporal profiles for central and peripheral information are only revealed when using large-field stimuli. Together, these results show that aligning the spatial sampling of humans and DNN encoding models can improve predictions of neural data. The distinct temporal profiles for encoding of peripheral vs. central information support a global-to-local processing hierarchy of real-world vision.