A Unified Visual Architecture for Few-shot Learning

Dequan Jin
Nan Xiang
Da Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Few-shot learning is a human’s remarkable cognitive capability. The core object recognition is a typical few-shot learning behavior where the ventral visual stream plays a crucial role. The ventral visual stream consists of the cortical areas V1, V2, V4, andthe inferior temporal (IT) lobe. Lots of research discussed their mechanisms and functions in cognition but seldom considered them as an integrative system, leading to the difficulty of understanding human learning mechanisms. To this end, in this paper, we propose a unified computational model to simulate the ventral visual stream. It consists of an encoder and a classifier. These modules reproduce the former areas V1 and V2 and the latter regions V4 and the IT, respectively. The primary function of the V1 and V2 is feature extraction. We replicate it using deep neural networks. The V4 is responsible for complex feature representation, and the IT for object recognition. We describe them by two coupled neural fields. With some biologically plausible strategies, the proposed model provides a biologically realistic and unified architecture for understanding human visual cognition and possesses excellent few-shot learning performance, outperforming the state-of-the-art few-shot learning algorithms and achieving outstanding results in the tests on real-world image datasets.

Version published to 10.21203/rs.3.rs-7634782/v1 on Research Square
Oct 6, 2025

FiT: Feature Integration Transformer with Universal Language Interface for Multi-Task Vision

This article has 6 authors:
1. Sana Cheema
2. Ghulam Gilanie
3. Tariq Alsahfi
4. Sami Alesawi
5. Raed Alsini
6. Ali Daud
This article has no evaluationsLatest version Oct 9, 2025
GradLIME: A CNN Local Interpretation Model Based on Feature Gradient Activation

This article has 8 authors:
1. Jinwei Zhao
2. Jiedong Liu
3. Zhenghao Shi
4. Yu Liu
5. Majid Habib Khan
6. Wei Wang
7. Minhui Zhu
8. Xinhong Hei
This article has no evaluationsLatest version Sep 25, 2025
Scene Text Recognition via Alternating Hierarchical-Global Attention in Encoder-Only Transformers

This article has 3 authors:
1. Shashank B N
2. S. Nagesh Bhattu
3. Sri Phani Krishna K
This article has no evaluationsLatest version Oct 1, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

FiT: Feature Integration Transformer with Universal Language Interface for Multi-Task Vision

GradLIME: A CNN Local Interpretation Model Based on Feature Gradient Activation

Scene Text Recognition via Alternating Hierarchical-Global Attention in Encoder-Only Transformers