A lightweight convolutional and vision transformer hybrid network for parameter efficient plant disease classification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate plant disease classification for edge deployment requires models that are both precise and efficient. Convolutional neural networks (CNNs) learn local lesion patterns effectively. Pure Transformers capture global dependencies more explicitly, but they typically incur a higher computational cost. We propose CViTLw, a lightweight CNN-Transformer hybrid comprising a MobileNetV2 branch, a compact Vision Transformer branch, and an attention-enhanced cross-fusion module. We further evaluate two lightweight attention mechanisms, SE and CBAM, within the same framework. Experiments on the PlantVillage and Maize Leaf Disease datasets are conducted under both controlled and field-acquired conditions. On the Maize Leaf Disease dataset, CViTLw-SE attains 94.85% accuracy with only 0.33M parameters. On PlantVillage, both CViTLw-SE and CViTLw-CB achieve 99.74% accuracy (as well as precision, recall, and F1-score) and an AUC of 100%. The most compact variants operate in less than 2 ms per image and exceed 500 FPS. Overall, CViTLw achieves a strong balance of accuracy, efficiency, and practical deployability. The source code is publicly available at \href{https://drive.google.com/file/d/1nHDZNomhAC7GNLEdAeMyH7qJq3yqh45W/view?usp=drive_link}{this link}.

Article activity feed