Stardust: improving spatial transcriptomics data analysis through space aware modularity optimization based clustering

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

Spatial transcriptomics (ST) combines stained tissue images with spatially resolved high-throughput RNA sequencing. The spatial transcriptomic analysis includes challenging tasks like clustering, where a partition among data points (spots) is defined by means of a similarity measure. Improving clustering results is a key factor as clustering affects subsequent downstream analysis. State-of-the-art approaches group data by taking into account transcriptional similarity and some by exploiting spatial information as well. However, it is not yet clear how much the spatial information combined with transcriptomics improves the clustering result.

Results

We propose a new clustering method, Stardust , that easily exploits the combination of space and transcriptomic information in the clustering procedure through a manual or fully automatic tuning of algorithm parameters. Moreover, a parameter-free version of the method is also provided where the spatial contribution depends dynamically on the expression distances distribution in the space. We evaluated the proposed methods results by analysing ST datasets available on the 10x Genomics website and comparing clustering performances with state-of-the-art approaches by measuring the spots stability in the clusters and their biological coherence. Stability is defined by the tendency of each point to remain clustered with the same neighbours when perturbations are applied.

Conclusions

Stardust is an easy-to-use methodology allowing to define how much spatial information should influence clustering on different tissues and achieving more stable results than state-of-the-art approaches.

Article activity feed

  1. Background

    This work has been published in GigaScience Journal under a CC-BY 4.0 license https://doi.org/10.1093/gigascience/giac075) and has published the reviews under the same license.

    **Reviewer 1. Nikos Karaiskos **

    Reviewer Comments to Author: In this article the authors developed Stardust, a computational method that can be used for spatially-informed clustering by combining transcriptional profiles and spatial information. As spatial sequencing technologies gain popularity, it is important to develop tools that can efficiently process and analyse such datasets. Stardust is a new method that goes in this direction. It is particularly appealing to make use of the spatial information and relationships to cluster gene expression in these datasets. Overall the quality of data used is high and the manuscript is clearly written. The algorithm behind Stardust is simple and consists of an interpolation between spatial and transcriptional distance matrices. A single parameter called space weight controls the contribution of the spatial distance matrix. The authors benchmark Stardust against other recently developed tools in five different spatial transcriptomics datasets by using two measures. Stardust therefore holds the potential of being a useful method that can be applied in different datasets.

    Before recommending the manuscript for publication, however, the authors should thoroughly address the following points:

    1. What is the rationale behind modelling the contributions as a linear sum of the spatial and transcriptional distance matrices? In particular, why did the authors not consider non-linear relationships as well? As cells neighboring in space often share similar transcriptional profiles (see for instance Nitzan et al., 2019 for this line of reasoning and several examples therein), I would expect product terms to be even more informative.
    2. The authors demonstrate Stardust's performance only on datasets obtained with the 10X Visium platform. How does Stardust perform on higher-resolution methods, such as Slide-Seq, Seq-scope etc? As ST methods will improve in resolution in the future, it is critical to be able to analyze such datasets as well. An important question here concerns scalability: how well does Stardust scale with the number of cells/spots?
    3. In Fig. 1b conclusions are driven based on the CSS for different space weights, but only for a clustering parameter=0.8. What happens for other clustering values? And can the authors comment on why the different space weight values do not perform consistently across the datasets (i.e. 0.5 is better for HBC2 but 0.75 for MK)?
    4. The authors compared Stardust with four other tools. The conclusion is that Stardust outperforms all other methods --and performs equivalently with BayesSpace. All of these methods, however, rely on choosing specific values for a number of parameters. Did the authors optimize these values when they benchmarked these methods against Stardust?
    5. I was able to successfully install Stardust and run it. The resulting clusters in the Seurat object, however, were all NAs. The authors should make an effort to better document how Stardust runs, including the input structure that the tool expects and potential issues that might arise.

    Re-review: The authors have successfully addressed all raised points. The introduction of Stardust*, in particular, is a valuable enhancement of the method. Therefore, I recommend the manuscript for publication.

  2. Spatial

    **Reviewer 2. Quan Nguyen **

    Reviewer Comments to Author: This work presents a new clustering method, Stardust, that has the potential to improve stability of clustering results against parameter changing. Stardust can assess the contribution to the clustering result by spatial information relative to gene expression information. Stardust appears to performs better than other methods in the two metrics used in this paper, stability and coefficient of variation. The essence of the method is the use of a spatial transcriptomics (ST) distance matrix as a simple linear combination of physical distance (S) and transcriptional distance (T) matrices. A weight factor is used for the S matrix to control and evaluate the contribution of the spatial information. The effort for evaluating multiple parameters and comparing with several latest methods and across a number of public spatial datasets is a highlight of the work. The authors also made the code available.

    Major comments:

    • The concept of combining spatial location and gene expression is not new and has been applied in most spatial clustering methods. It is not clear what are the new additions to current available methods, except for a feature to weigh the contribution of spatial components to clustering results.
    • The approach to assess the contribution of spatial information, by varying the weight factor from 0 to 1 is rather simple, because the contribution can be nonlinear and vary between spots/cells (e.g. spatial distance becomes more important for spots/cells that are nearer to each other; some genes are more spatially variable than the others; applying one weight factors for all genes and all spots would miss these variation sources)
    • The 5 weight factors 0, 0.25, 0.50, 0.75, and 1 were used. However, this range of parameters provided too few data points to assess the impact of spatial factor. As seen in figures, the 5 data points do not strongly suggest a point where the spatial contribution is maximum/minimum due to large fluctuation of values in the y-axis.
    • Although two performance metrics are used (stability and variation), there needs to be an additional metric about how the clustering results represent biological ground truth cell type composition or tissue architecture (for example, by comparing to pathological annotation). Consequently, it is unclear if the stardust results are closer to the biological ground truth or not.
    • Stardust was tested on multiple 10x Visium datasets, but different types of spatial transcriptomics data like seqFISH, Slideseq, MERFISH, ect. are also common. Extended assessment of potential applications to other technologies would be useful. Minor comments:
    • The paragraphs and figure legends in the Result section are repetitive.
    • The result section is descriptive and there is no Discussion section.

    Re-review:

    The authors have improved the initial manuscript markedly. There are a couple of important points regarding comparisons between Stardust and Stardust* that need to be addressed:

    1. In which cases Stardust* improves over Stardust? It seems the results would be dependent on different biological systems (i.e., tissue types). The authors suggest both versions produce comparable results, but given the major change in the formula (replacing a constant weight with variable weights as normalised gene expression values to [0,1] minmax scale), there are likely differences between Stardust and Stardust*. For example, certain genes will have higher weight than the others, therefore making the effects of the weights variable among genes. For this example, the authors may assess highly abundant genes vs low abundant genes
    2. In cases where spatial distances are important, Stardust* could be less accurate than Stardust version with a high space weight. How Stardust* considers cases that spatial distance is as important as gene expression.