Spatiotemporal evidence accumulation through saccadic sampling for object recognition
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Visual object recognition has been extensively studied under fixation conditions, but our natural viewing involves frequent saccadic eye movements that scan multiple local informative features within an object (e.g., eyes and mouth in a face image). Such visual exploration can facilitate object recognition, but mechanistic accounts of the contribution of saccades are yet to be established due to the presumed complexity of the interactions between the visual and oculomotor systems. Here, we present a framework for formulating object recognition as a process of accumulating evidence from local features through saccades to render a decision. This approach offers a simple model that quantitatively explains human face and object categorization behavior, even under conditions in which people freely make saccades to scan local features, departing from past studies that required controlled eye movements to examine trans-saccadic integration. Notably, our experimental results showed that active saccade commands (efference copy) did not substantially contribute to behavioral performance and that the patterns of saccades were largely independent of the ongoing decision-making processes. Therefore, we propose that object recognition with saccades can be approximated using a parsimonious decision-making model without assuming complex interactions between the visual and oculomotor systems.