CREST-Former: A Region-Enhanced Swin Transformer for Pest Recognition Based on Contrastive Learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Transformers with long-range dependency capabilities provide effective means for pest classification in agricultural engineering. However, their self-attention mechanism often causes query tokens to overly focus on local image patches, limiting the effective receptive field. To address this challenge, this paper proposes a novel Region-Enhanced Swin Transformer for Pest Recognition Based on Contrastive Learning (CREST-Former) architecture, which enhances pest identification through innovative attention mechanisms and multi-scale feature extraction. Our network integrates three innovative modules: (1) PDSwin Transformer block, utilizing multi-receptive field depth-separable convolution and self-attention mechanisms to simultaneously capture features at different scales, enhancing the model’s perception ability for minute morphological features of insects; (2) Discriminant Region Enhancement Module (DREM) that automatically identifies the most distinctive regions of pest morphology to improve classification accuracy.(3) we also design a discriminative region-guided contrastive learning framework, significantly improving feature intra-class compactness and inter-class separability. Experiments show that CREST-Former achieves classification accuracies of 76.13%, 99.85%, and 79.16% on the IP102, D0, and CPB datasets, respectively. Heatmap visualization confirms that the model precisely focuses on discriminative morphological regions of pests, and it has been successfully deployed on the Jetson Nano platform for practical applications.

Article activity feed