spatiAlign: an unsupervised contrastive learning model for data integration of spatially resolved transcriptomics

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

Integrative analysis of spatially resolved transcriptomics datasets empowers a deeper understanding of complex biological systems. However, integrating multiple tissue sections presents challenges for batch effect removal, particularly when the sections are measured by various technologies or collected at different times.

Findings

We propose spatiAlign, an unsupervised contrastive learning model that employs the expression of all measured genes and the spatial location of cells, to integrate multiple tissue sections. It enables the joint downstream analysis of multiple datasets not only in low-dimensional embeddings but also in the reconstructed full expression space.

Conclusions

In benchmarking analysis, spatiAlign outperforms state-of-the-art methods in learning joint and discriminative representations for tissue sections, each potentially characterized by complex batch effects or distinct biological characteristics. Furthermore, we demonstrate the benefits of spatiAlign for the integrative analysis of time-series brain sections, including spatial clustering, differential expression analysis, and particularly trajectory inference that requires a corrected gene expression matrix.

Article activity feed

  1. Competing Interest StatementThe authors have declared no competing interest.

    Reviewer 3. Jose Fernandez Navarro

    The authors present a novel computational method to integrate SRT datasets claiming that the method adjusts for batch effects while retaining the biological differences. The method provides the possibility to adjust the gene expression counts to be used for downstream analysis. The method was benchmarked against other methods that are available for integration of single cell and spatial transcriptomics datasets obtaining positive results. The manuscript is well structured and clear, it provides a robust motivation and the comparisons with other methods are clear and well defined. The method has the potential to make a contribution to the field, specially considering that it has been developed to be compatible with scanpy and that an open-source library has been made available on GitHub.

    Introduction:- In the following sentence: "batch effects caused by nonbiological factors such as technology differences and different experimental batches." the authors could have elaborated more and perhaps included some references.- In the following sentence: "In contrast, popular MNN-based methods such as Seurat v3[16] efficiently address batch effects in gene expression, but their limitation lies in the ability to align only two batches at a time, and they become impractical when dealing with many batches" I do not think the MNN-based term is correct in that context. Also, I do not entirely agree in the claim. One generally does not have many batches to correct for and the referred methods can perform batch correction in datasets with more than 2 batches.- In the following statement: "However, PRECAST only returns the corrected embedding space, and GraphST requires registering the spatial coordinates of samples first to ensure its integration performance; thus, their applications are limited in certain scenarios. "I'm not in total agreement, I understand PRECAST provides a module to obtain corrected gene expression counts for downstream analysis. Results:- I find the introduction to spatiAlign a bit long. It could perhaps be simplified and then leave the implementation details to the Methods section.- In the following sentence: "..spatial neighbouring graphs between cells/spots (e.g., cell‒cell adjacent matrix A), where the connective relationships of cells/spots are negatively associated with Euclidean distance." I find it a bit misleading, are the authors building the spatial graph using a fixed radius? Or euclidean distances in a manifold?- I could not find a detailed description on how the different datasets were processed with the others methods that they used to benchmark.- I believe to measure the power of the methods to retain biological differences, comparing consecutive sections of the same tissueis not enough. I would also include a comparison using sections from different individuals (same region).- In the MOB datasets comparison, by looking at the UMAP figures, the differences in performance it is not so clear in the cases of SCALED and BBKNN.In the Hippocampus dataset, I did not see information on how the clusters were annotated. It would have been nice to include the ABA figures of the same region. I found it difficult to understand the basis and interpretation of the spatial autocorrelation analysis with Moran's I. In the MOB embryo dataset, did the authors consider include a comparison with the other methods? Figures:I observed some of the supplementary figures are out of order or the labels do not match the panels, I encourage the authors to revise this. I also noticed some of the panels showing expression plots do not have a bar with the range of expression. The labels in some of the panels are hard to read and I miss some labels (f.e. the section/dataset in some of the panels).Some figures make reference to the ABA and/or the tissue morphology. For these, I could suggest including the HE images and/or IF images from the ABA. Figure 2a-c: the fonts are hard to read. Figure 2d is hard to read, perhaps the layout would be better by making it one column per method?. Figure 3g would be easier to read if the 3 datasets were arranged side by side. Figure S4, I find the clusters hard to see clearly.

    Datasets and documentation: The authors provide links to the original datasets but they do not provide access to the processed and annotated datasets, this makes it difficult to replicate the results and the examples provided in the documentation. The manuscript would benefit if the authors would provide better documentation and means to reproduce/replicate the analyses.

    Software: I was able to install the package with PyPy in a Conda environment but I had to manually install some dependencies to make it work.Major comments:- I would like to suggest the authors to revise the figures. The supplementary figures descriptions do not seem to match the content of the figures. Some of the figures are missing labels and color bars.- I would like to suggest the authors to correct for grammar and misspelling errors and perform a throughout proof reading of the manuscript for consistency.- I would like the authors to provide links to access the processed/annotated datasets.- I would like the authors to provide more details on how the datasets were processed with their method and the others method (hyperparameters, versions, etc..). This could be complemented greatly if the authors could provide notebooks or step-by-step documentation.- I would like to suggest the authors to include a comparison with true biological differences such as different phenotypes and/or genotypes.- I would like to suggest the authors to include some of other methods in the MOB (stereo-seq) comparison.- I would like to suggest the authors to check their claim that PRECAST does not provide "corrected" gene counts or that the other methods do not provide means to perform downstream analyses (DEG, trajectory inference, etc…).- I would like to suggest the authors to include normalized counts as well as raw counts in some of the comparisons (for example when performing the trajectory analysis or showing the spatial distribution of features). Minor comments:- I would like to suggest the authors to not use the term "expression enhacenment", to me the gene expression is corrected or adjusted but not enhanced.- I would like to suggest the authors to improve the documentation of the open-source package to provide more information on the different arguments and options. It would also be nice to provide documentation and/or notebooks to reproduce the analysis (or some) presented in the manuscript.- I would like to suggest the authors to improve the installation of the PyPy package since some dependencies seem to be missing.- I would like to suggest the authors to improve the layouts and font size of some of the for clarity and readability.

    Re-review: I acknowledge the efforts made by the authors to address the comments and provide answers. However, I still find the manuscript not ready for publication. These are my comments: Major:- The authors have included a new analysis (sup. figure 7) using a dataset (tumor liver) that lacks a stereotypical structure. While this is a good addition to the manuscript, I would still like to see the performance of spatiAlign in correcting technicaleffects while retaining true biological differences (f.e. disease and control). In addiction to this, a comparison using a imaging-based technology (f.e Merfish or CosMx) would make the manuscript stronger.- The authors have made an effort to provide Jupyter notebooks with code to reproduce the analyses. Unfortunately, this is uncompleted. None of the notebooks contain code to reproduce the spatiAlign analyses and only the notebook with the tumor liver dataset (sup. figure 7)includes the processing steps. For the other datasets they authors use hard-coded values. Moreover, I was unable to run some of the notebooks due to errors and missing files and/or dependencies. The authors should provide one notebook for each dataset including the processing and analysis and provide means to run the notebooks (environment files and/or docker files) in an easy way that enables reproduciblity. Ideally, these notebooks should also include the spatiAlign analysis.- I observed a strange effect in figure 2 where the UMAP manifolds of the BBKNN, Harmony and Combat are similar. I could identify the error causing this in one of the notebooks. I strongly suggest the authors to revise all the analyses and figures and to provide notebooks to reproduce these in an easy way as I mentioned before.- I find the MNN performance surprisingly bad. I wonder if this could be due to how the data was processed with this method. Did the authorstry to disable cosine normalization for the output?.

    Minor:- I think the manuscript would be stronger if the authors would include the normalized counts in the figures where they show the raw counts.- I still find inconstancies in the text (typos, grammatical and syntactical errors). The authors are still using the term enhanced (specially in figure legends).- In the MOB dataset, the authors claim that the Visium spots are 100mm but that cannot be true, visium spots are 50mm.- In figure 3 (panel f) use the same layout as figure 2 for consistency.- In figure 4 (panel g) the color bar and labels are missing.- In Sup. figure 3 (panel c) the color bar is out of place and the legend is missing.

    Re-review: The authors have made a great effort to improve the manuscript. The improvements on the documentation and open-source package will be appreciated by the community. I only have minor comments:- The grammar has improved but I could still see some errors (to cite a few):- line 96 "dimensional reduction"- line 346 "structure and MERFISH"- I still think that the authors have not been able to fully demonstrate the performance of their method to integrate datasets with true biological/phenotypical differences (f.e. disease and healthy). Supplementary figures 7 and 8 add value of the manuscript by integrating tumor cells from different patients but this is not exactly what reviewer 1 and Isuggested. I acknowledge the explanations that the authors provide in their response but I'm not in total agreementwith the statements. There are publicly available datasets that could suit this analysis. I will not request to amend such analysis to the manuscript but I could at least suggest to mention this in the manuscript as a limitation or future work.

  2. AbstractIntegrative analysis of spatially resolved transcriptomics datasets empowers a deeper understanding of complex biological systems. However, integrating multiple tissue sections presents challenges for batch effect removal, particularly when the sections are measured by various technologies or collected at different times. Here, we propose spatiAlign, an unsupervised contrastive learning model that employs the expression of all measured genes and the spatial location of cells, to integrate multiple tissue sections. It enables the joint downstream analysis of multiple datasets not only in low-dimensional embeddings but also in the reconstructed full expression space. In benchmarking analysis, spatiAlign outperforms state-of-the-art methods in learning joint and discriminative representations for tissue sections, each potentially characterized by complex batch effects or distinct biological characteristics. Furthermore, we demonstrate the benefits of spatiAlign for the integrative analysis of time-series brain sections, including spatial clustering, differential expression analysis, and particularly trajectory inference that requires a corrected gene expression matrix.Competing Interest Statement

    Reviewer 2. Stefano Monti

    The manuscript addresses the very challenging problem of integrating multiple spatially resolved transcriptomics datasets, and proposes a novel algorithm based on multiple deep learning techniques, including DNN encoders, and self supervised and contrastive learning. Evaluation on several datasets is presented alongside comparison to multiple existing methods using several integration metrics (LISI, ARI, etc.). The presented method appears to outperform existing methods according to multiple criteria, and thus it represents a significant contribution to the field worth publishing.

    The write-up is adequate, although the description of the method very "abstract", and it would benefit from more specificity in describing the inputs and outputs of each step, how some of the models are shared (e.g., is the DNN encoder shared only across sections/samples or also across the original (Fig 1C, top) and perturbed (Fig 1C, bottom) inputs? Likewise for the Graph Encoder), and the intuition behind each of the steps included.

    Some specific comments:

    • It would be helpful if the results sections describing each of the applications (DLPFC datasets, Olfactory bulb datasets, etc.) were more detailed in the description of the datasets to be combined. What are the inputs (how many samples, are sections the same as samples?, how many slices per sample, etc).
    • Unless I'm mistaken, the labeling of Fig S1 is wrong. I think fig S1a is the UMap and S1b is the "manual annotation" rather than the other way around?
  3. AbstractIntegrative analysis of spatially resolved transcriptomics datasets empowers a deeper understanding of complex biological systems. However, integrating multiple tissue sections presents challenges for batch effect removal, particularly when the sections are measured by various technologies or collected at different times. Here, we propose spatiAlign, an unsupervised contrastive learning model that employs the expression of all measured genes and the spatial location of cells, to integrate multiple tissue sections. It enables the joint downstream analysis of multiple datasets not only in low-dimensional embeddings but also in the reconstructed full expression space. In benchmarking analysis, spatiAlign outperforms state-of-the-art methods in learning joint and discriminative representations for tissue sections, each potentially characterized by complex batch effects or distinct biological characteristics. Furthermore, we demonstrate the benefits of spatiAlign for the integrative analysis of time-series brain sections, including spatial clustering, differential expression analysis, and particularly trajectory inference that requires a corrected gene expression matrix.

    Reviewer 1. Lamda Moses.

    This papers presents spatiAlign, a package that batch corrects spatial transcriptomics data and performs spatially informed clustering. Spatial information is incorporated in the graph layers in the variational graph autoencoder which performs dimension reduction, and in the reduced dimensional space, self-supervised contrastive learning is used to batch correct and to assign cells/spots to clusters. The autoencoder then reconstructs a batch corrected gene count matrix for downstream use with methods that require a full gene count matrix. The method seems reasonable for this task and is well-described, more intuitively in the Results section and in more details in the Methods section.

    Then spatiAlign is benchmarked against several popular and state of the art methods for batch correction, including two recently published methods that use spatial information (GraphST and PRECAST) and several not using spatial information but commonly used (e.g. Seurat, Harmony, COMBAT). The choice of existing methods to benchmark is fair. The LISI F1 score is a reasonable metric to quantify performance in both batch correction and cluster separation when the spatial clusters in the brain datasets used in benchmarking are already annotated. The iLISI (batch correction) and cLISI (cluster separation), analogous to precision and recall in the original F1, are shown separately in the supplement. The F1 score is around 0.8 for spatiAlign, which is pretty good. When there is no a priori annotation, the iLISI is used to quantify how well different batches mix and Moran's I is used to indicate spatial coherence of the clusters, which are then validated with differential expression. spatiAlign is also demonstrated to integrate data from different technologies—Stereo-seq and Visium—which have different spatial resolutions. Finally, spatiAlign is demonstrated on the developing mouse brain integrating data across multiple time points.

    The language of this paper is good and does not require extensive editing for clarity. The spatiAlign package can be installed with pip and has a minimal tutorial on the documentation website.

    Overall, I find this paper well-written and a valuable contribution to this field. There are many methods that perform batch correction without using spatial information, and several that align different tissue sections, some using transcriptome information, but without correcting for batch effect in the transcriptomes. Not all methods that take spatial information into account give a batch corrected full gene count matrix as an output. The metrics reasonably demonstrate superior performance of spatiAlign compared to other methods benchmarked on the datasets used.

    Below are my questions and comments that may improve this paper:

    1. All the benchmarking datasets are from the brain, though different parts of the brain, from human and mouse, with different morphologies. The brain has a stereotypical structure. As spatiAlign uses the spatial neighborhood graph rather than the original coordinates, can it be applied to tissues without such stereotypical structure, such as tumors, skeletal muscle, colon, liver, lung, and adipose tissue? Benchmarking on a dataset from a tissue without a stereotypical structure would make a stronger case, to be more representative of the full breadth of spatial transcriptomics datasets.
    2. Biological variability is mentioned, such as from different regions of hippocampus and different stages of development. Many studies have a disease or experiment group and a control group, often with multiple subjects in each group. There are biological differences among the subjects and technical batch effects between sections, but the differences between case and control are of interest, so we have different kinds of batches. Benchmarking on a case/control study would be really helpful. How well does spatiAlign preserve biological differences between case and control while correcting for technical batch effects?
    3. The Methods section says, "Inspired by unsupervised contrastive clustering[32], we map each spot/cell i into an embedding space with d dimensions, where d is equal to the number of pseudoprototypical clusters." In Tutorial 2 on the documentation website, the latent dimension is set to be 100. Why is this number chosen? Can you clarity how to choose the number of latent dimensions? How does this affect downstream results?
    4. Since you use the k nearest neighbor graph when constructing the spatial neighborhood graph that feeds into the variational graph autoencoder, what are the reasons why k=15 is chosen? Should it be different for array-based technologies such as Visium and Stereo-seq and imaging-based technologies with single cell resolution such as MERFISH? Furthermore, due to different spatial resolutions, the spatial neighborhood graph has different biological meanings for Visium and MERFISH.
    5. All the benchmarking datasets are from array-based technologies: Visium, Slide-seq, and Stereo-seq. Imaging-based technologies are getting commercialized and getting more widely adopted, especially MERFISH and Molecular Cartography. It would be great if you benchmark using an imaging-based dataset and perhaps integrate an imaging-based and an array-based dataset, to be more representative of the full breadth of spatial transcriptomics technologies. This should also take into consideration that imaging-based datasets typically only profile a few hundred genes while array-based datasets are transcriptome-wide. This might be too much for this paper, but should at least be mentioned in the Discussions section.
    6. Is the code used to reproduce the figures available?
    7. Generally, the y axes of bar charts for F1 scores, ARI, normalized iLISI, and normalized cLISI are really confusing when they don't start at 0 and end at 1. This exaggerates how much better spatiAlign performs compared to other methods when the other methods aren't that much worse based on the numbers, such as in Figure 2c.
    8. In Supplementary Figure S4b, do you actually mean 1 - cLISI? If a smaller cLISI is better, then spatiAlign performs the worst in this case, and should have a low F1 score in Figure 2c.
    9. It would be helpful to include a computational time and memory usage benchmark.
    10. The join count statistic is a spatial autocorrelation statistic designed for binary data, and may thus be more appropriate than Moran's I to indicate spatial coherence of clusters, although Moran's I does convey the message of spatial coherence here.
    11. The documentation website can be improved by making a description of all parameters of the functions available, to explain what each parameter means and what kind of input and output is expected.
    12. It would be helpful to include preprocessing in the tutorial on the documentation website. Do we need to log normalize the data first and why? Does the data need to be scaled?

    Below are minor technical comments:

    1. The notation for the LISI F1 score in the Methods sections is very confusing. Based on context and the definition of the F1 score, you probably meant to put parentheses around 1 - cLISInorm .
    2. Typo in "SCAlEX" in Supplementary Figure S5a; you seem to mean "SCALEX". It's more aesthetically pleasing to be consistent in capitalizing according to the original names of the packages in Supplementary Figure S5.

    Re-review

    For the most part, the authors have satisfactorily addressed concerns raised by the reviewers. Below are my followup comments on the revised manuscript:

    1. The authors missed the point of my second comment on case/control studies. What I was asking for is performance of spatiAlign and other related packages when integrating case datasets and control datasets while preserving biological differences of interest to the study. For example, data from healthy liver (control) and hepatic steatosis (case) are integrated. Case and control samples were collected from different patients and may be mounted on different slides. How well does spatiAlign preserve differences between healthy and steatosis, while correcting for technical batch effect? In Figure S7, the two sub-slices are still from the same disease condition. Case/control studies should at least be mentioned in the Discussions section.
    2. The authors have provided thoughtful explanations on data scaling, number of latent dimensions, and number of neighbors in the k nearest neighbor graph in the response to reviewers. However, these explanations are not found in the manuscript or on the documentation website. Because these explanations are very relevant to users, it would be helpful to add them to either the manuscript or the documentation website.
    3. For the bar charts, I suggest assigning a fixed color to each data integration method and keeping it consistent throughout this study. Right now the bar charts don't have a consistent color scheme even within the same figure. Keeping a consistent color scheme can reduce the mental burden of readers since the colors are a stand-in for the different methods. Also, a colorblind-friendly palette should be used.
    4. I agree with Reviewer 3 that the grammar in this paper should be improved. For example, in lines 75-76, "in which gene expression is adjustment" should be "in which gene expression is adjusted". In lines 82-83, the "adjusted" in "laminar organization with adjusted, and clear boundaries between regions" does not make sense given the context referring to Figure 2f. In line 332, "the benchmarking methods" should be "the benchmarked methods", because the methods are being benchmarked and the methods themselves are not meant for benchmarking. Grammar in the newly added section from line 344 onwards should be corrected.