Semantically-Guided State-Space Models for Data-Efficient and Robust Cross-Platform Virtual Staining
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Histologic staining is a cornerstone of clinical pathology, enabling the visualization of cellular structures and facilitating accurate diagnoses. However, traditional staining methods are labor-intensive, inconsistent, and reliant on reagent concentrations and operator expertise. Virtual staining offers an efficient alternative but faces challenges including cross-platform adaptability and heavy reliance on paired training data. Here, we present a virtual staining framework that integrates Mamba state space models with cycle-consistent adversarial networks (CycleGAN), significantly reducing data requirements while maintaining or improving staining quality. Our approach incorporates three key innovations: (1) an efficient, high-quality virtual staining method based on Mamba state-space models, enhanced by modules like Adaptive Frequency Filtering Upsampling (AFFU), demonstrating robust cross-platform generalization; (2) a highly efficient entropy-hue guided data selection strategy that drastically reduces data requirements, potentially applicable to other data-scarce domains in biomedical imaging; and (3) a multi-level semantic guidance approach utilizing vision-language models to inject domain knowledge, improving feature preservation and cross-modal adaptability. We validated our approach on two distinctly different microscopy platforms: a UV photoacoustic microscopy system with a 40× objective lens and a Zeiss AxioScan scanner with a 20× objective lens. For H\&E virtual staining from label-free UV photoacoustic images, our method required only 12.5% of the data compared to baseline CycleGAN models while achieving a 22% improvement in FID metrics (from 51.13 to 41.03). For H&E to Masson's trichrome conversion on the Zeiss system, our approach used only 38.5% of the data while improving FID by 16.1% (from 17.06 to 13.26) and achieving a structural similarity index of 0.984. Our framework requires only 2-3 complete tissue sections to meet training needs on new microscopy platforms, with inference times under 3 minutes per whole-slide image on a standard workstation (NVIDIA RTX 3090). This approach reduces staining time from 24-72 hours to minutes while preserving essential morphological features, offering potential for rapid pathological screening and diagnosis in resource-constrained settings.