Language of Stains: Tokenization Enhances Multiplex Immunofluorescence and Histology Image Synthesis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multiplex tissue imaging (MTI) is a powerful tool in cancer research, allowing spatially resolved, single-cell phenotype analysis. However, MTI platforms face challenges such as high costs, tissue loss, lengthy acquisition times, and complex analysis of large, multichannel images with batch effects. To address these challenges, we propose a novel computational method to model the interactions between dozens of panel markers and Hematoxylin & Eosin (H&E) staining, enabling in-silico generation of marker stains. This approach reduces the reliance on experimentally measured markers, bridging low-cost H&E data with MTI’s high-content information. Our approach uses a two-stage frame-work for channel-wise bioimage synthesis: first, vector quantization learns a visual token vocabulary, then a bidirectional transformer infers missing markers through masked language modeling. Comprehensive bench-marking across different MTI platforms and tissue types demonstrates the effectiveness of our method in improving marker prediction while maintaining biological relevance. This advance makes high-dimensional multiplex tissue imaging more accessible and scalable, supporting deeper insights and potential clinical applications in cancer research.

Article activity feed