TopicVI: A Knowledge-guided deep interpretable model for resolving context-specific gene programs
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Mechanistic insights from single-cell and spatial transcriptomics largely rely on cell clustering, differential expression analysis, and interpretation through prior biological knowledge. However, this approach is often limited by the reliance on curated biological priors that fail to capture context-specific gene programs, particularly in complex disease states. To address this gap, we introduce TopicVI, a deep interpretable model that integrates established biological knowledge with data-driven refinement to discover context-dependent gene programs in single-cell and spatial transcriptomic data. TopicVI jointly infers cell clusters and gene topics using optimal transport to flexibly align prior gene programs with observed data while permitting context-specific refinements. Comprehensive benchmarking demonstrates that TopicVI outperforms existing methods in biological conservation, batch correction, topic coherence, and rare cell identification. TopicVI effectively disentangles multiple sources of biological variation, such as separating anatomy-specific expression patterns from disease-associated signatures in spatial transcriptomics. Applying TopicVI to glioblastoma datasets, we identify gene topics related to cell cycle regulation and EGFR signaling that reveal convergent tumor states across distinct drug perturbations. By integrating prior knowledge with data-driven discovery, TopicVI enables identification of interpretable gene programs that illuminate biological processes and therapeutic mechanisms in complex transcriptomics data.