S3R: Modeling spatially varying associations with Spatially Smooth Sparse Regression
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Spatial transcriptomics (ST) data demands models that recover how associations among molecular and cellular features change across tissue while contending with noise, collinearity, cell mixing, and thousands of predictors. We present Spatially Smooth Sparse Regression (S3R), a general statistical framework that estimates location-specific coefficients linking a response feature to high-dimensional spatial predictors. S3R unites structured sparsity with a minimum-spanning-tree–guided smoothness penalty, yielding coefficient fields that are coherent within neighborhoods yet permit sharp boundaries. In synthetic data, S3R accurately recovers spatially varying effects, selects relevant predictors, and preserves known boundaries. Applied to Visium-based ST data, S3R recapitulates layer-specific target–TF associations in human dorsolateral prefrontal cortex with concordant layer-wise correlations in matched single-cell data. In acute Haemophilus ducreyi skin infection, S3R converts spot-level gene expression mixtures into cell type–attributed expression fields, revealing per-cell type spatial gradients, and improving concordance of spatially variable gene calls when tests are applied to these demixed fields. In pancreatic ductal adenocarcinoma, S3R builds cross–cell-type, cross-gene co-variation tensors that quantify cell–cell interaction strength at gene-pair resolution and nominate interacting genes whose pathway enrichments align with established stromal–epithelial and immune crosstalk. An efficient implementation scales to large assays, and on a Xenium-based breast cancer dataset, S3R delineates the contributions at gene–gene, local neighborhood, and global context-level to target gene expression. Because responses and predictors in S3R are user-defined, it could flexibly address diverse biological questions within a single, scalable, and interpretable regression framework.