Inferring spatial gene expression from tissue images using large-scale histology foundation model with SpaFoundation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Spatial transcriptomics (ST) has revolutionized biological research by enabling the joint profiling of gene expression and spatial context, along with histological images. However, current existing ST technologies remain high-cost and time-consuming, hindering their broader clinical applications. Although computational methods have been developed to infer gene expression directly from histology images, these methods still suffer from limited accuracy and spatial resolution due to insufficient training data and model capacity. Here, we introduce SpaFoundation, a large-scale histology foundation model designed to accurately predict spatial gene expression from tissue images. SpaFoundation employs a teacher-student Vision Transformer (ViT) architecture to learn generalizable histological representations by modeling potential dependencies among image patches. Notably, we incorporated self-distillation and masked image modeling (MIM) jointly to capture high-level semantic representations and fine-grained structural features, enriching spots’ representations. The model is pretrained on 1.79 million patches spanning 26 tissue types, with 80 million parameters. We validated SpaFoundation using 117 samples, demonstrating its flexibility across different spatial resolutions and superior performance in spatial gene expression prediction, as well as strong transferability to downstream tasks such as tumor detection and spatial domain clustering. Our results highlight the potential of large-scale foundation model to learn informative histological representations and underscore the benefits of domain-specific pretraining in extracting task-relevant representations, paving the way for foundation model-driven spatial gene expression inference. The implementation and pre-trained weights of SpaFoundation are publicly available at https://github.com/NingZhangCSUBio/SpaFoundation .