TissueNarrator: Generative Modeling of Spatial Transcriptomics with Large Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The intricate spatial organization and molecular communication among cells are fundamental to multicellular systems. Spatial transcriptomics (ST) enables gene expression profiling while preserving spatial context, providing rich data for studying cellular interactions and tissue dynamics. However, most existing computational approaches focus on embedding-based tasks and provide limited generative capacity for simulating cell behavior in situ . Moreover, accurately interpreting spatial interactions requires extensive biological knowledge, which current models do not incorporate. Here, we introduce T issue N arrator , a framework that reformulates spatial omics analysis as a language modeling problem. By representing tissue sections as spatial sentences – rank-based gene lists augmented with spatial coordinates and metadata – T issue N arrator leverages pretrained large language models (LLMs) to learn spatially conditioned gene expression patterns. The model generates realistic, context-aware cellular profiles, predicts intercellular interactions, and performs in silico perturbation analyses. Across multiple ST technologies (MERFISH, Perturb-FISH, and CosMx SMI), T issue N arrator achieves superior quantitative performance and recovers biologically meaning-ful ligand–receptor and signaling pathways. Furthermore, a conversational inference mode enables natural-language querying of tissue organization. By integrating pretrained biological knowledge with spatial context, T issue N arrator establishes a new, scalable generative paradigm for modeling, simulating, and reasoning about tissue systems.