An Efficient Hybrid Convolutional Vision Transformer Framework with Spatial Attention for Rice Leaf Disease Identification and Categorization

Pushpa Athisaya Sakila Rani

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Disease detection and categorization in rice leaf play a crucial role in mitigating crop damage and supporting sustainable agriculture. Traditional approaches, which often rely on manual inspection, are limited by labor intensity, variability and error susceptibility. This paper introduces a Hybrid Convolutional Vision Transformer (CVT) model with Spatial Attention (SA) to enhance the detection accuracy and classification reliability in rice leaves. The proposed CVT framework integrates a Convolutional Neural Network (CNN), which is the backbone for initial feature extraction with a Vision Transformer (ViT) in advanced feature representation. Convolutional Neural Network captures the essential textures and shapes, while the Vision Transformer applies attention across image patches, effectively learning the complex spatial dependencies necessary for identifying disease-specific characteristics within diverse field environments. Further, SA module refines the model by assigning greater weight to diseased regions, reducing interference from non-leafbackground areas. Experimental results on rice leafdataset demonstrate that the hybrid CVT with SA model achieves over 98.12% feature extraction accuracy, 98.56% classification accuracy in dataset 1 and 98.26% feature extraction accuracy, 98.67% classification accuracy in dataset 2 across multiple rice leaf categories, outperforming baseline CNN and ViT models. Spatial Attention heat maps highlight the most important locations during decision-making process, making the model more interpretable. This hybrid CVT model offers a scalable solution for rice leaf disease detection and categorization, with potential applications in precise agriculture systems, including drone-based or mobile implementations for field monitoring. The presented model exhibits maximum performancethan the othertraditional methods.

Version published to 10.21203/rs.3.rs-8969006/v1 on Research Square
Mar 16, 2026

A lightweight convolutional and vision transformer hybrid network for parameter efficient plant disease classification

This article has 3 authors:
1. Thanh-Hai Tong-Le
2. Minh-Hai Le
3. Thanh-Nghi Doan
This article has no evaluationsLatest version Apr 17, 2026
A Lightweight Residual Dilated CNN–Transformer Framework for Efficient Rice Leaf Disease Classification

This article has 5 authors:
1. Uddagiri Sirisha
2. Kanhaiya Sharma
3. Phani Praveen
4. Deepak Parashar
5. N S Koti Mani Kumar Tirumanadham
This article has no evaluationsLatest version Mar 24, 2026
Efficient Attention-Based Hybrid Deep Learning Architecture for Multi-Crop Plant Disease Recognition

This article has 2 authors:
1. Yash Ghavghave
2. Rajendra Rewatkar
This article has no evaluationsLatest version Mar 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A lightweight convolutional and vision transformer hybrid network for parameter efficient plant disease classification

A Lightweight Residual Dilated CNN–Transformer Framework for Efficient Rice Leaf Disease Classification

Efficient Attention-Based Hybrid Deep Learning Architecture for Multi-Crop Plant Disease Recognition