Profiling ranked list enrichment scoring in sparse data elucidates algorithmic tradeoffs

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Gene Set Enrichment Analysis (GSEA) is a method for quantifying pathway and process activation in groups of samples, and its single sample version (ssGSEA) scores activation using mRNA abundance in a single sample. GSEA and ssGSEA were developed for “bulk” samples rather than individual cell technologies such as microarrays and bulk RNA-sequencing (RNA-seq) data. The growing use of single cell RNA-sequencing (scRNA-seq) raises the possibility of using ssGSEA to quantify pathway and process activation in individual cells. However, scRNA-seq data is much sparser than RNA-seq data. Here we show that ssGSEA as designed for bulk data is subject to some amount of score uncertainty and other technical issues when applied to individual cells from scRNA-seq data. We also show that a ssGSEA can be applied robustly to “pseudobulk” aggregate groups of a few hundred to a few thousand cells provided appropriate normalization is used. Finally, in comparing this approach to other ranked list enrichment methods, we find that the UCell method is most robust to sparsity. We have made the aggregate cell version of ssGSEA available as a Python package and GenePattern module and will also modularize UCell for use on GenePattern as well.

Article activity feed