ConvDeiT-Tiny: Adding Local Inductive Bias to DeiT-Ti for Enhanced Maize Leaf Disease Classification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Reliable identification of maize leaf diseases is critical for mitigating crop losses, particularly in regions where farmers have limited access to experts. Although vision transformers (ViTs) have recently demonstrated strong performance in image recognition, their weak inductive bias and limited modelling of local texture patterns make them non-ideal for fine-grained maize leaf disease classification. To address these limitations, we propose ConvDeiT-Tiny, a lightweight hybrid ViT that improves DeiT-Ti by placing depthwise convolutions in parallel with multi-head self-attention modules in the first three transformer blocks. The local and global features captured by the convolution and attention modules are concatenated along the embedding dimension and fused using a multilayer perceptron. This results in richer token representations without significantly increasing model size. Across three datasets, ConvDeiT-Tiny (6.9M parameters) consistently outperformed DeiT-Ti, DeiT-Ti-Distilled, and DeiT-S (21.7M parameters) when trained from scratch. With transfer learning, ConvDeiT-Tiny achieved an accuracy of 99.15%, 99.35%, and 98.60% on the CD&S, primary, and Kaggle datasets, respectively, surpassing many previous studies with far fewer parameters. For explainability, we present gradient-weighted transformer attribution visualizations showing the disease lesions driving model predictions. These results indicate that injecting local inductive bias in early transformer blocks is beneficial for accurate maize leaf disease classification.

Article activity feed