Face Processing in Humans and CNNs: Comparing the Reliance on Holistic and Local Feature-Based Information

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

It is intensively debated whether Convolutional Neural Networks (CNNs) constitute appropriate models for human vision. Here, we investigated whether CNNs show a typical characteristic of human face perception, namely holistic processing. In Experiment 1, we compared unfamiliar face matching performance between a CNN trained on face recognition and N = 32 human participants for different types of face images: Normal faces (with intact holistic and local feature-based information), Mooney faces (with intact holistic and degraded local feature-based information), and scrambled faces (with intact local feature-based information and degraded holistic information). The CNN showed significantly larger performance decrements for both Mooney and scrambled faces than human participants. In Experiment 2, we trained three CNN architectures on face recognition, one with unrestricted field size and two with field sizes restricted to approximately 1/9 and 1/16 of the input image, respectively. Subsequently, we compared unfamiliar face matching performance between these CNNs and N = 36 human participants who viewed face images either in an unrestricted fashion or through a movable spotlight-like viewing aperture covering approximately 1/9 or 1/16 of the face images. While human face matching accuracy was substantially impaired by restricting the visual input with apertures, CNN performance was not affected by restriction of the receptive field size. These results suggest that (a) CNNs are able to achieve high face matching accuracy without using holistic information (b) the reliance of holistic information in CNNs depends on the specific optimisation conditions under which models were trained.

Article activity feed