A lightweight convolutional and vision transformer hybrid network for parameter efficient plant disease classification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate plant disease classification for edge deployment requires models that are both precise and efficient. Convolutional neural networks (CNNs) learn local lesion patterns effectively. Pure Transformers capture global dependencies more explicitly, but they typically incur a higher computational cost. We propose CViTLw, a lightweight CNN-Transformer hybrid comprising a MobileNetV2 branch, a compact Vision Transformer branch, and an attention-enhanced cross-fusion module. We further evaluate two lightweight attention mechanisms, SE and CBAM, within the same framework. Experiments on the PlantVillage and Maize Leaf Disease datasets are conducted under both controlled and field-acquired conditions. On the Maize Leaf Disease dataset, CViTLw-SE attains 94.85% accuracy with only 0.33M parameters. On PlantVillage, both CViTLw-SE and CViTLw-CB achieve 99.74% accuracy (as well as precision, recall, and F1-score) and an AUC of 100%. The most compact variants operate in less than 2 ms per image and exceed 500 FPS. Overall, CViTLw achieves a strong balance of accuracy, efficiency, and practical deployability. The source code is publicly available at \href{https://drive.google.com/file/d/1nHDZNomhAC7GNLEdAeMyH7qJq3yqh45W/view?usp=drive_link}{this link}.