Spatiotemporal evidence accumulation through saccadic sampling for object recognition

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Visual object recognition has been extensively studied under fixation conditions, but our natural viewing involves frequent saccadic eye movements that scan multiple local informative features within an object (e.g., eyes and mouth in a face image). These saccades would contribute to object recognition by subserving the integration of sensory information across local features, but mechanistic models that explain human behavior during natural object recognition have yet to be established due to the presumed complexity of the interactions between the visual and oculomotor systems. Here, we employ a framework of perceptual decision making and show that human face and object categorization behavior with saccades can be quantitatively explained by a model that simply accumulates the sensory evidence available at each moment. Our model can fit human object-recognition behavior even under conditions in which people freely make saccades to scan local features, departing from past studies that required controlled eye movements to test trans-saccadic integration. Moreover, further experimental results confirmed that active saccade commands (efference copy) do not substantially contribute to behavioral performance. Therefore, we propose that object recognition with saccades can be approximated by a parsimonious decisionmaking model without assuming complex interactions between the visual and oculomotor systems.

Article activity feed