scGPT-spatial: Continual Pretraining of Single-Cell Foundation Model for Spatial Transcriptomics

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Spatial transcriptomics has emerged as a pivotal technology for profiling gene expression of cells within their spatial context. The rapid growth of publicly available spatial data presents an opportunity to further our understanding of microenvironments that drive cell fate decisions and disease progression. However, existing foundation models, largely pretrained on single-cell RNA sequencing (scRNA-seq) data, fail to resolve the spatial relationships among samples or capture the unique distributions from various sequencing protocols. We introduce scGPT-spatial , a specialized foundation model for spatial transcriptomics continually pretrained on our previously published scGPT scRNA-seq foundation model. We also curate SpatialHuman30M, a comprehensive spatial transcriptomics dataset comprising of 30 million spatial transcriptomic profiles, encompassing both imaging- and sequencing-based protocols. To facilitate integration, scGPT-spatial introduces a novel MoE (Mixture of Experts) decoder that adaptively routes samples for protocol-aware decoding of gene expression profiles. Moreover, scGPT-spatial employs a spatially-aware sampling strategy and a novel neighborhood-based training objective to better capture spatial co-localization patterns among cell states within tissue. Empirical evaluations demonstrate that scGPT-spatial robustly integrates spatial data in mulit-slide and multi-modal settings, and effectively supports cell-type deconvolution and contextualized missing gene expression imputation, outperforming many existing methods. The scGPT-spatial codebase is publicly available at https://github.com/bowang-lab/scGPT-spatial .

Article activity feed