A Unified Visual Architecture for Few-shot Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Few-shot learning is a human’s remarkable cognitive capability. The core object recognition is a typical few-shot learning behavior where the ventral visual stream plays a crucial role. The ventral visual stream consists of the cortical areas V1, V2, V4, andthe inferior temporal (IT) lobe. Lots of research discussed their mechanisms and functions in cognition but seldom considered them as an integrative system, leading to the difficulty of understanding human learning mechanisms. To this end, in this paper, we propose a unified computational model to simulate the ventral visual stream. It consists of an encoder and a classifier. These modules reproduce the former areas V1 and V2 and the latter regions V4 and the IT, respectively. The primary function of the V1 and V2 is feature extraction. We replicate it using deep neural networks. The V4 is responsible for complex feature representation, and the IT for object recognition. We describe them by two coupled neural fields. With some biologically plausible strategies, the proposed model provides a biologically realistic and unified architecture for understanding human visual cognition and possesses excellent few-shot learning performance, outperforming the state-of-the-art few-shot learning algorithms and achieving outstanding results in the tests on real-world image datasets.