Recurrent issues with deep neural network models of visual recognition

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Object recognition requires flexible and robust information processing, especially in view of the challenges posed by naturalistic visual settings. The ventral stream in visual cortex is provided with this robustness by its recurrent connectivity. Recurrent deep neural networks (DNNs) have recently emerged as promising models of the ventral stream. In this study, we asked whether DNNs could be used to explore the role of different recurrent computations during challenging visual recognition. We assembled a stimulus set that included manipulations that are often associated with recurrent processing in the literature, like occlusion, partial viewing, clutter, and spatial phase scrambling. We obtained a benchmark dataset from human participants performing a categorisation task on this stimulus set. By applying a wide range of model architectures to the same task, we uncovered a nuanced relationship between recurrence, model size, and performance. While recurrent models reach higher performance than their feedforward counterpart, we could not dissociate this improvement from that obtained by increasing model size. We found consistency between humans and models patterns of difficulty across the visual manipulations, but this was not modulated in an obvious way by the specific type of recurrence or size added to the model. Finally, depth/size rather than recurrence makes model confusion patterns more human-like. Contrary to previous assumptions, our findings challenge the notion that recurrent models are better models of human recognition behaviour than feedforward models, and emphasise the complexity of incorporating recurrence into computational models.

Author summary

Deep neural networks (DNNs) are considered the best current models of visual recognition. This is mainly due to the correspondence between their structure and that of the ventral stream in the primate visual system, as well as a double match between their representations and behaviour with human neural representations and error patterns. Recently, it has been suggested that adding recurrence to usually feedforward-only DNNs improved this match, while simultaneously making their architecture more brain-like. But how much of human behaviour do these models actually replicate, and does recurrence really make things better? We conducted an in-depth investigation of this question by putting DNNs to the test. In our work, we ask: do models still resemble humans when the task becomes complicated, and: are they making use of similar strategies to operate object recognition? Bringing different architectures together, we show that recurrence tends to increase model performance and consistency with humans. However, we cannot dissociate this improvement from that brought by parameter size alone. Additionally, we find a striking worsened match with human patterns of errors in models with recurrence, as compared to purely feedforward models. Contrary to previous assumptions, our findings challenge the notion that recurrent models are better models of human recognition behaviour than feedforward models, and emphasise the complexity of incorporating recurrence into computational models.

Article activity feed