Comparative analysis of convolutional and vision transformer models for automated leukocyte classification enhanced by generative color augmentation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Manual differential leukocyte counting is a critical yet time-consuming and observer-dependent process in clinical hematology. This study presents a comparative analysis of You Only Look Once v11 (YOLOv11) and Vision Transformer (ViT) architectures for the classification of 14 leukocyte types and artifacts using a private clinical dataset. We further investigated the impact of HistAuGAN, a domain-specific data augmentation strategy designed to simulate real-world staining variability. Across experimental settings, ViT models achieved higher overall performance than YOLOv11 variants, and the application of HistAuGAN led to systematic improvements in both architectural families. The best-performing configuration, trained on the HistAuGAN-augmented dataset, achieved a macro F1-score of 98.36% and an overall accuracy of 99.75% on the validation set. To assess generalization capacity, this configuration was additionally evaluated on the public PBC and LISC datasets, demonstrating meaningful cross-dataset performance without architectural modification. Model interpretability was examined through attention- and activation-based saliency analyses, indicating that predictions were primarily driven by morphologically relevant leukocyte regions rather than background structures. These findings suggest that combining global-context modeling with domain-informed augmentation provides a robust and clinically coherent framework for fine-grained leukocyte classification.