CREST-Former: A Region-Enhanced Swin Transformer for Pest Recognition Based on Contrastive Learning

JiXiang Zou
WenZhong Yang
YaBo Yin
ZhiShan Feng
ChuangXiang Li

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Transformers with long-range dependency capabilities provide effective means for pest classification in agricultural engineering. However, their self-attention mechanism often causes query tokens to overly focus on local image patches, limiting the effective receptive field. To address this challenge, this paper proposes a novel Region-Enhanced Swin Transformer for Pest Recognition Based on Contrastive Learning (CREST-Former) architecture, which enhances pest identification through innovative attention mechanisms and multi-scale feature extraction. Our network integrates three innovative modules: (1) PDSwin Transformer block, utilizing multi-receptive field depth-separable convolution and self-attention mechanisms to simultaneously capture features at different scales, enhancing the model’s perception ability for minute morphological features of insects; (2) Discriminant Region Enhancement Module (DREM) that automatically identifies the most distinctive regions of pest morphology to improve classification accuracy.(3) we also design a discriminative region-guided contrastive learning framework, significantly improving feature intra-class compactness and inter-class separability. Experiments show that CREST-Former achieves classification accuracies of 76.13%, 99.85%, and 79.16% on the IP102, D0, and CPB datasets, respectively. Heatmap visualization confirms that the model precisely focuses on discriminative morphological regions of pests, and it has been successfully deployed on the Jetson Nano platform for practical applications.

Version published to 10.21203/rs.3.rs-6644439/v1 on Research Square
Aug 21, 2025

ALMformer: a modified Transformer based on Adaptive frequency enhanced attention, large kernel convolution, and multi-scale implementation for bearing fault diagnosis

This article has 5 authors:
1. Xiao Chang
2. Shaobin Cai
3. Wanchen Cai
4. Yuchang Mo
5. Liansuo Wei
This article has no evaluationsLatest version Jul 21, 2025
Enhancing Infrared-Visible Image Fusion via Text-Guided Adaptive Feature Integration

This article has 6 authors:
1. Jundong Zhang
2. Yanan Guo
3. Kangjian He
4. Dan Xu
5. SongHan Zheng
6. WenCheng Mei
This article has no evaluationsLatest version Jul 23, 2025
A Multi-Scale Feature Fusion Dual-Branch Mamba-CNN Network for Landslide Extraction

This article has 3 authors:
1. Zhiheng Yang
2. Hua Zhang
3. Nanshan Zheng
This article has no evaluationsLatest version Sep 2, 2025

Listed in

Abstract

Article activity feed

Related articles

ALMformer: a modified Transformer based on Adaptive frequency enhanced attention, large kernel convolution, and multi-scale implementation for bearing fault diagnosis

Enhancing Infrared-Visible Image Fusion via Text-Guided Adaptive Feature Integration

A Multi-Scale Feature Fusion Dual-Branch Mamba-CNN Network for Landslide Extraction