A comparative study of statistical methods for identifying differentially expressed genes in spatial transcriptomics
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Spatial transcriptomics (ST) provides unprecedented insights into gene expression patterns while retaining spatial context, making it a valuable tool for understanding complex tissue architectures, such as those found in cancers. Seurat, by far the most popular tool for analyzing ST data, uses the Wilcoxon rank-sum test by default for differential expression analysis. However, as a nonparametric method that disregards spatial correlations, the Wilcoxon test can lead to inflated false positive rates and misleading findings. This limitation highlights the need for a more robust statistical approach that effectively incorporates spatial correlations. To this end, we propose a Generalized Score Test (GST) in the Generalized Estimating Equations (GEEs) framework as a robust solution for differential gene expression analysis in ST. We conducted a comprehensive comparison of the GST with existing methods, including the Wilcoxon rank-sum test and the GEEs with the robust Wald test. By appropriately accounting for spatial correlations, extensive simulations showed that the GST demonstrated superior Type I error control and comparable power relative to other methods. Applications to ST datasets from breast and prostate cancer showed that the GST-identified differentially expressed genes were enriched in pathways directly implicated in cancer progression. In contrast, the Wilcoxon test- identified genes were enriched in non-cancer pathways and produced substantial false positives, highlighting its limitations for spatially structured data. Our findings suggest that the GST approach is well-suited for ST data, offering more accurate identification of biologically relevant gene expression changes. We have implemented the proposed method in R package “SpatialGEE”, available on GitHub.
Author Summary
Spatial transcriptomics (ST) provides unprecedented insights into gene expression patterns while retaining spatial context, making it a valuable tool for studying complex tissue architectures and disease etiology. Seurat, a widely used software tool for analyzing ST data, relies on the Wilcoxon rank-sum test for differential expression analysis. However, this test ignores spatial correlations, leading to inflated false positive rates and misleading findings. This limitation highlights the need for a more robust statistical approach that effectively incorporates spatial correlations. To this end, we have proposed a Generalized Score Test (GST) in the Generalized Estimating Equations (GEEs) framework as a robust solution for differential gene expression analysis in ST. By appropriately accounting for spatial correlations, extensive simulations showed that the GST demonstrated superior false positive rate control and comparable power relative to other methods. Applications to ST datasets from breast and prostate cancer showed that GST identified cancer-related genes and pathways more accurately than the Wilcoxon test, which produced misleading results. We have implemented the proposed method in R package “SpatialGEE”, available on GitHub.