An Efficient Hybrid Convolutional Vision Transformer Framework with Spatial Attention for Rice Leaf Disease Identification and Categorization

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Disease detection and categorization in rice leaf play a crucial role in mitigating crop damage and supporting sustainable agriculture. Traditional approaches, which often rely on manual inspection, are limited by labor intensity, variability and error susceptibility. This paper introduces a Hybrid Convolutional Vision Transformer (CVT) model with Spatial Attention (SA) to enhance the detection accuracy and classification reliability in rice leaves. The proposed CVT framework integrates a Convolutional Neural Network (CNN), which is the backbone for initial feature extraction with a Vision Transformer (ViT) in advanced feature representation. Convolutional Neural Network captures the essential textures and shapes, while the Vision Transformer applies attention across image patches, effectively learning the complex spatial dependencies necessary for identifying disease-specific characteristics within diverse field environments. Further, SA module refines the model by assigning greater weight to diseased regions, reducing interference from non-leafbackground areas. Experimental results on rice leafdataset demonstrate that the hybrid CVT with SA model achieves over 98.12% feature extraction accuracy, 98.56% classification accuracy in dataset 1 and 98.26% feature extraction accuracy, 98.67% classification accuracy in dataset 2 across multiple rice leaf categories, outperforming baseline CNN and ViT models. Spatial Attention heat maps highlight the most important locations during decision-making process, making the model more interpretable. This hybrid CVT model offers a scalable solution for rice leaf disease detection and categorization, with potential applications in precise agriculture systems, including drone-based or mobile implementations for field monitoring. The presented model exhibits maximum performancethan the othertraditional methods.

Article activity feed