Comparative analysis of convolutional and vision transformer models for automated leukocyte classification enhanced by generative color augmentation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Manual differential leukocyte counting is a critical yet time-consuming and observer-dependent process in clinical hematology. This study presents a comparative analysis of You Only Look Once v11 (YOLOv11) and Vision Transformer (ViT) architectures for the classification of 14 leukocyte types and artifacts using a private clinical dataset. We further investigated the impact of HistAuGAN, a domain-specific data augmentation strategy designed to simulate real-world staining variability. Across experimental settings, ViT models achieved higher overall performance than YOLOv11 variants, and the application of HistAuGAN led to systematic improvements in both architectural families. The best-performing configuration, trained on the HistAuGAN-augmented dataset, achieved a macro F1-score of 98.36% and an overall accuracy of 99.75% on the validation set. To assess generalization capacity, this configuration was additionally evaluated on the public PBC and LISC datasets, demonstrating meaningful cross-dataset performance without architectural modification. Model interpretability was examined through attention- and activation-based saliency analyses, indicating that predictions were primarily driven by morphologically relevant leukocyte regions rather than background structures. These findings suggest that combining global-context modeling with domain-informed augmentation provides a robust and clinically coherent framework for fine-grained leukocyte classification.

Article activity feed