CLIP-CMDF Enhanced Vision Language Models with Novel GAN for Hematological Analysis: A Text-Guided White Blood Cell Identification Framework

Mohammad Momenian
Seyed Vahab Shojaedini

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper presents a novel approach combining CLIP (Contrastive Language-Image Pre-training) with CMDF (Cross-Modal Dynamic Filters) methodology, enhanced by a specialized Generative Adversarial Network (Saliency-Consistent Cycle GAN, and Policy-Augmented Robust GAN), to address text-guided white blood cell classification in the Raabin dataset. Our hybrid framework tackles the challenging problem of correlating random significant sentences with specific leukocyte types, including morphologically complex cells such as basophils and eosinophils. The proposed CLIP-CMDF architecture leverages vision-language understanding while incorporating multi-scale feature extraction for semantic-visual alignment. A novel GAN architecture generates balanced text-image pairs to address class imbalance issues in the dataset. Experimental results demonstrate 80% accuracy, achieving competitive performance against state-of-the-art medical vision-language models including Med-PaLM M (78.5%) and GPT-4V Medical (77.2%). This research establishes a new benchmark for text-guided hematological analysis and provides a reproducible framework for sentence-to-cell-type association tasks. The implementation source code is accessible via the following link.

Version published to 10.21203/rs.3.rs-7166904/v1 on Research Square
Jul 31, 2025

ConVLM: Concept-Guided Vision-Language Models for Explainable Dermatological Diagnosis

This article has 1 author:
1. Alexander Davis
This article has no evaluationsLatest version Aug 11, 2025
Transformer-Aided Skin Cancer Classification Using VGG19-Based Feature Encoding

This article has 6 authors:
1. Fallah H. Najjar
2. Zaid Nidhal Khudhair
3. Farhan Mohamed
4. Mohd Shafry Mohd Rahim
5. Vei Siang Chan
6. Ali Hilal Ali
This article has no evaluationsLatest version Jul 15, 2025
Lightweight UNet with Multi-module Synergy and Dual-domain Attention for Precise Skin Lesion Segmentation

This article has 10 authors:
1. Changhua Chen
2. Ling Li
3. Bo Li
4. Haijun Li
5. Yanjie You
6. Wei Zhou
7. Yuyan Bin
8. Zhuo Wang
9. Jiayi Li
10. Chao Zhang
This article has no evaluationsLatest version Aug 13, 2025

Listed in

Abstract

Article activity feed

Related articles

ConVLM: Concept-Guided Vision-Language Models for Explainable Dermatological Diagnosis

Transformer-Aided Skin Cancer Classification Using VGG19-Based Feature Encoding

Lightweight UNet with Multi-module Synergy and Dual-domain Attention for Precise Skin Lesion Segmentation