Multimodal Vision-Language Framework for Text-Guided Leukemia Classification Using Advanced Deep Learning Architectures

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Hematological malignancy classification faces significant challenges in correlating descriptive textual content with specific leukemia subtypes (AML, ALL, CLL, CML), requiring advanced multimodal approaches for accurate diagnosis. We developed a novel framework integrating Vision-and-Language Transformer (VILT) with Multi-Domain Feature Aggregation (MDFA) methodology for text-guided leukemia classification. The architecture incorporates Consistency-Aware Cycle GAN to synthesize balanced text-image combinations and mitigate class distribution disparities, while leveraging hierarchical feature extraction for enhanced semantic-visual correspondence. Experimental validation on the Raabin hematological dataset achieved 76.8% classification accuracy, outperforming contemporary medical vision-language architectures including MedCLIP-SAM (73.8%) and BiomedGPT-V (72.1%). This work establishes new benchmarks for text-driven hematological malignancy analysis and provides a reproducible methodology for textual-to-leukemia-type correlation applications in clinical diagnostics.

Article activity feed