SpaceBF: Spatial coexpression analysis using Bayesian Fused approaches in spatial omics datasets

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Advances in spatial omics enable measurement of genes (spatial transcriptomics) and peptides, lipids, or N-glycans (mass spectrometry imaging) across thousands of locations within a tissue. While detecting spatially variable molecules is a well-studied problem, robust methods for identifying spatially varying co-expression between molecule pairs remain limited. We introduce SpaceBF, a Bayesian fused modeling framework that estimates co-expression at both local (location-specific) and global (tissue-wide) levels. SpaceBF enforces spatial smoothness via a fused horseshoe prior on the edges of a predefined spatial adjacency graph, allowing large, edge-specific differences to escape shrinkage while preserving overall structure. In extensive simulations, SpaceBF achieves higher specificity and power than commonly used methods that leverage geospatial metrics, including bivariate Moran's I and Lee's L. We also benchmark the proposed prior against standard alternatives, such as intrinsic conditional autoregressive (ICAR) and Mat'ern priors. Applied to spatial transcriptomics and proteomics datasets, SpaceBF reveals cancer-relevant molecular interactions and patterns of cell-cell communication (e.g., ligand-receptor signaling), demonstrating its utility for principled, uncertainty-aware co-expression analysis of spatial omics data.

Article activity feed

  1. AbstractAdvances in spatial omics enable measurement of genes (spatial transcriptomics) and peptides, lipids, or N-glycans (mass spectrometry imaging) across thousands of locations within a tissue. While detecting spatially variable molecules is a well-studied problem, robust methods for identifying spatially varying co-expression between molecule pairs remain limited. We introduce SpaceBF, a Bayesian fused modeling framework that estimates co-expression at both local (location-specific) and global (tissue-wide) levels. SpaceBF enforces spatial smoothness via a fused horseshoe prior on the edges of a predefined spatial adjacency graph, allowing large, edge-specific differences to escape shrinkage while preserving overall structure. In extensive simulations, SpaceBF achieves higher specificity and power than commonly used methods that leverage geospatial metrics, including bivariate Moran’s I and Lee’s L. We also benchmark the proposed prior against standard alternatives, such as intrinsic conditional autoregressive (ICAR) and Matérn priors. Applied to spatial transcriptomics and proteomics datasets, SpaceBF reveals cancer-relevant molecular interactions and patterns of cell–cell communication (e.g., ligand–receptor signaling), demonstrating its utility for principled, uncertainty-aware co-expression analysis of spatial omics data.

    This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag006), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer 2: Daniel Domovic

    Dear authors,

    I read your manuscript "SpaceBF: Spatial coexpression analysis using Bayesian Fused approaches in spatial omics datasets" with interest.

    The manuscript presents SpaceBF, a Bayesian method for detecting spatial co-expression between pairs of molecules in spatial omics data. The topic is relevant since new technologies like spatial transcriptomics, mass spectrometry imaging, and multiplex immunofluorescence produce large data but current tools for co-expression are limited. The authors try to solve this gap with a new model and they also test it on real datasets. The paper is technical, but it also gives biological examples, which is helpful for readers.

    The paper has many strong points. First, the idea to use Bayesian fused horseshoe prior together with MST spatial structure is new and well explained. Second, the authors apply their method on three real datasets and they show interesting biology, for example IGF2-IGF1R relation, keratin isoform consistency, and stromal ECM peptides. Third, I appreciate that the code is open on GitHub. Also, the paper compares with other methods and deals with the common problem of variance-stabilizing transform by modeling UMI counts directly with negative binomial distribution.

    Overall, the work is clear and well organized, but there are some points where more explanation or clarification would help. In my review I give major and minor remarks that I hope will improve the paper.

    Major remarks

    1. Were you worried choosing MST may oversimplify spatial relationships, since many meaningful local neighborhoods may be excluded? Would the results of SpaceBF be significantly different if a different spatial graph, such as kNN, Delaunay triangulation, or kernel-based, was used instead of MST?
    2. Since MST edges depend a lot on pairwise L2 distances, how stable are the results if spatial coordinates are a little noisy, or if there are tissue registration errors?
    3. The model puts one molecule as outcome and the other as predictor. Are the co-expression estimates still the same if you switch roles?
    4. In the Results you mention "FDR < 0.1." Can you explain which method you used for FDR? Also, are the discoveries robust if you change the threshold (for example 0.05 vs 0.1)?
    5. Do the simulation parameters (lengthscale, slope, dispersion) correspond to realistic biological signal strengths and spatial scales observed in real datasets? Three values of the lengthscale l are considered, l = 3.6, 7.2, 18. Why exactly these values? What does ν=0.75 mean in terms of effect size? How does l=18 compare to real tissue lengthscales?
    6. Can you describe runtime and memory for larger datasets, like 10X Visium with 5,000-20,000 spots? Is the current MCMC practical for this scale, or do you think approximate inference (like variational Bayes or INLA) is needed?

    Minor remark

    1. How sensitive are the results to the choice of hyperparameters for the Horseshoe prior?
    2. In the Results you state that keratins "co-express highly, meaning their binding patterns with any specific type 1 keratin should be similar." Please make clear that SpaceBF measures co-expression, not direct binding, so that conclusions are not overstated.
    3. You mention SpatialCorr and Copulacci, but the comparison was not successful. Even if parameters were sensitive, I think one short numerical comparison in the supplement would be helpful.
    4. You filter out genes with fewer than ~59 total reads (0.2 x number of spots). Can you justify the choice of this threshold and show if results are stable for other thresholds (for example 0.1x or 0.5x)? Since many ligands and receptors are lowly expressed, is there a risk of losing meaningful biology? Since the dataset has only 293 spots, thresholds can have strong effect.
  2. AbstractAdvances in spatial omics enable measurement of genes (spatial transcriptomics) and peptides, lipids, or N-glycans (mass spectrometry imaging) across thousands of locations within a tissue. While detecting spatially variable molecules is a well-studied problem, robust methods for identifying spatially varying co-expression between molecule pairs remain limited. We introduce SpaceBF, a Bayesian fused modeling framework that estimates co-expression at both local (location-specific) and global (tissue-wide) levels. SpaceBF enforces spatial smoothness via a fused horseshoe prior on the edges of a predefined spatial adjacency graph, allowing large, edge-specific differences to escape shrinkage while preserving overall structure. In extensive simulations, SpaceBF achieves higher specificity and power than commonly used methods that leverage geospatial metrics, including bivariate Moran’s I and Lee’s L. We also benchmark the proposed prior against standard alternatives, such as intrinsic conditional autoregressive (ICAR) and Matérn priors. Applied to spatial transcriptomics and proteomics datasets, SpaceBF reveals cancer-relevant molecular interactions and patterns of cell–cell communication (e.g., ligand–receptor signaling), demonstrating its utility for principled, uncertainty-aware co-expression analysis of spatial omics data.

    This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag006), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer 1: Satwik Acharyya

    Summary: The manuscript introduces a novel statistical framework for analyzing spa- tially varying molecular co-expression. Leveraging a Bayesian fused modeling approach, SpaceBF estimates both local (location-specific) and global (tissue-wide) co-expression pat- terns, particularly useful for studying cell-cell communication via ligand-receptor interac- tions. The method outperforms traditional geospatial metrics like bivariate Moran's I and Lee's L in terms of specificity and precision. Application of SpaceBF to spatial omics data reveals new insights into molecular interactions across various cancer types, offering a pow- erful tool for spatial omics research. The paper is nicely written, well structured, and great visualizations but I have the following comments.

    1. The authors missed a couple of key references related to co-expression analysis of spatial omics data such as JOBS (Chakrabarti et al., 2024) and SpaceX (Acharyya et al., 2022). The authors are recommended to include these references in the Introduction Section.
    2. A method related figure can be included for visual illustration of the method.
    3. In Melanoma ST data analysis, authors have used the RCTD algorithm (Cable et al., 2022) for cell-type estimation. It seems like the gene expression matrix has been used twice in the whole process: once in case of cell-type estimation and co-expression analysis afterwards. The obtained results can be highly correlated due to multiple uses of the gene expression matrix. It would be great if authors can address this issue.
    4. In the cSCC ST data analysis, BayesSpace (Zhao et al., 2021) algorithm has been used for spatial region identification. In Figure 2C, cluster numbers are provided only and those are not transferred to spatial regions. It is difficult to make spatial region specific inference without such regional annotation of clusters. The gene expression matrix is used multiple times in this case as well (spatial region identification and co-expression analysis).
    5. The spatial omcis datasets are sparse in nature. It possible that some these edges may not exist if the molecules are far apart. Authors are requested to justify the use shrinkage prior such as horseshoe rather than spike-and-slab prior.
    6. While the authors briefly mention about the associated computational costs, it is recommended to include a comparison of the computational costs for different approaches in the simulation studies. This would provide a more comprehensive understanding of the proposed method's efficiency and feasibility. It will be also interesting to see the scalability of the method for large scale datasets.
    7. To ensure the robustness of the proposed methodology, it is requested that the authors include a detailed sensitivity analysis for the selected priors and parameters.