Feature Fusion Units for Fine-grained Image Categorization

Hua Zhao
Zujun Liu
Bin Yang
Tianyu Lu
Ying Xing

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Fine-grained image categorization aims to categorize subclasses by processing detailed features, which is still a critical problem to be solved in computer version due to the small differences between subclasses. The traditional methods are usually to find features by manual annotation, using specific sliding Windows, using different thresholds and other methods. These methods are not only costly, but also ineffective. In computer version, by calculating attention scores between parts of the picture multiple times and weighting them, the transformer greatly improves the accuracy of categorization. In this paper, we propose a feature weight units. Specifically, transformer is used as the backbone to capture image feature(these features are called patches in transformer), and then all patches are weighted by our feature weight unit. The computal result of feature fusion unit represents the importance of the patch should to be forced on. To verify the effectiveness of our method, we conducted experiments on the CUB-200-2011 and stanford-dog datasets.

Version published to 10.21203/rs.3.rs-7717226/v1 on Research Square
Nov 13, 2025

Exploring the Collaboration Between Vision Models and LLMs for Enhanced Image Classification

This article has 3 authors:
1. Bhavya Rupani
2. Dmitry Ignatov
3. Radu Timofte
This article has no evaluationsLatest version Dec 15, 2025
A Comprehensive Comparative Analysis of Convolutional Neural Network Architectures for Image Classification and Object Detection Tasks

This article has 3 authors:
1. Fahim Al Islam
2. Saif Hossain
3. Monir Hosen
This article has no evaluationsLatest version Feb 3, 2026
MultiLingual Scene Text Detection via Group-Specific Models

This article has 5 authors:
1. Jhonatas Conceição
2. Manuel Córdova
3. Allan Pinto
4. Ricardo da S. Torres
5. Helio Pedrini
This article has no evaluationsLatest version Dec 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Exploring the Collaboration Between Vision Models and LLMs for Enhanced Image Classification

A Comprehensive Comparative Analysis of Convolutional Neural Network Architectures for Image Classification and Object Detection Tasks

MultiLingual Scene Text Detection via Group-Specific Models