Image Classification on Small Datasets using Query Attention Module

Chunyu Jiang
Renwei Li
Hanxiang Zhang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Vision Transformers (VTs) are increasingly popular in computer vision due to their robust global modeling. However, they do not have the same learning advantages as Convolutional Neural Networks (CNNs), which can be trained effectively with less data. This paper proposes a plug-and-play module, query attention. Without altering the backbone structure, it can be integrated with existing VTs and CNNs. Furthermore, to reduce the training cost of the VT backbone, we integrate query attention with a downsized backbone to construct a shrunk structure. We selected three classical VTs, ViT, Swin, and T2T-ViT, and tested them on four small datasets(CIFAR10, CIFAR100, Tiny-ImageNet, CINIC10). The results show that query attention can improve model performance, especially with the shrunk structure, where we can maintain competitive performance while reducing computational and memory complexity. For example, on the Tiny-ImageNet, after adding query attention to ViT, the classification accuracy increases by 5.77%. Additionally, based on the shrunk structure of the ViT, we achieved a 20% reduction in parameters and a 32.76% reduction in computation cost, while improving classification accuracy by 4.26%.

Version published to 10.21203/rs.3.rs-4447366/v2 on Research Square
Sep 9, 2024
Version published to 10.21203/rs.3.rs-4447366/v1 on Research Square
Jun 6, 2024

A Comprehensive Comparative Analysis of Convolutional Neural Network Architectures for Image Classification and Object Detection Tasks

This article has 3 authors:
1. Fahim Al Islam
2. Saif Hossain
3. Monir Hosen
This article has no evaluationsLatest version Feb 3, 2026
Efficient and Robust Convolutional Neural Network Design for Resource-Constrained Image Recognition Systems

This article has 1 author:
1. Shivam Pandey
This article has no evaluationsLatest version Mar 2, 2026
A Deep Learning Based Aggregative Framework for Object Detection in Road Environments

This article has 1 author:
1. thayyaba khatoon mohammed
This article has no evaluationsLatest version Mar 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Comprehensive Comparative Analysis of Convolutional Neural Network Architectures for Image Classification and Object Detection Tasks

Efficient and Robust Convolutional Neural Network Design for Resource-Constrained Image Recognition Systems

A Deep Learning Based Aggregative Framework for Object Detection in Road Environments