Cross-Domain Tomato Disease Classification via Flexible Contrastive Clustering in Vision-Language Models

Muhammad Shafay
Divya Velayudhan
Taimur Hassan
Muhammad Owais
Irfan Hussain
Naoufel Werghi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Plant disease detection systems face significant challenges in cross-domain generalization, particularly when transitioning from controlled laboratory settings to diverse field conditions. Traditional deep learning approaches exhibit severe performance degradation across different imaging environments, limiting practical deployment in real-world agricultural scenarios. This paper introduces a novel Flexible Contrastive Clustering (FCC) framework for zero-shot tomato disease classification that addresses fundamental generalization limitations through vision-language learning. Unlike standard CLIP’s one-to-one image-text pairing, our method leverages one-to-many relationships where each disease image is associated with multiple diverse textual descriptions, enabling robust representation learning across linguistic variations. The FCC framework optimizes class-based clustering in joint embedding space through a specialized loss function that treats all same-class descriptions as positives, facilitating effective handling of both seen and unseen disease categories during zero-shot evaluation. We evaluate our approach on PlantDoc training data (740 images) and test across four diverse tomato disease datasets totaling 17,313 images, spanning laboratory and field conditions. Experimental results demonstrate substantial improvements over state-of-the-art vision-language models, achieving an average of 30.15% accuracy and 28.05% weighted F1-score on average across all test datasets. Our method shows particularly strong performance on field datasets, achieving 59.70% accuracy on FieldPlant and 26.52% on Tomato Village, significantly outperforming existing approaches. Attention visualization analysis reveals effective disease localization capabilities for both seen and unseen categories, validating the practical applicability of our approach for real-world agricultural monitoring systems.

Version published to 10.21203/rs.3.rs-9198236/v1 on Research Square
Mar 24, 2026

Progressive Layer Activation CLIP for Few-Shot and Generalizable Cassava Disease Recognition

This article has 6 authors:
1. Muhammad Shafay
2. Muhammad Owais
3. Divya Velayudhan
4. Taimur Hassan
5. Irfan Hussain
6. Naoufel Werghi
This article has no evaluationsLatest version Mar 16, 2026
A lightweight convolutional and vision transformer hybrid network for parameter efficient plant disease classification

This article has 3 authors:
1. Thanh-Hai Tong-Le
2. Minh-Hai Le
3. Thanh-Nghi Doan
This article has no evaluationsLatest version Apr 17, 2026
TriAttnNet Based Deep Learning Model for Automated Cotton Pest Detection and Disease Classification

This article has 3 authors:
1. A. Mohan
2. P Chiranjeevi
3. Krishna Mohan
This article has no evaluationsLatest version Apr 1, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Progressive Layer Activation CLIP for Few-Shot and Generalizable Cassava Disease Recognition

A lightweight convolutional and vision transformer hybrid network for parameter efficient plant disease classification

TriAttnNet Based Deep Learning Model for Automated Cotton Pest Detection and Disease Classification