pyVIPER: A fast and scalable Python package for rank-based enrichment analysis of single-cell RNASeq data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Summary
Single-cell sequencing has revolutionized biomedical research by offering insights into cellular heterogeneity at unprecedented resolution. Yet, the low signal-to-noise ratio, characteristic of single-cell RNA sequencing (scRNASeq), challenges quantitative analyses. We have shown that gene regulatory network (GRN) analysis can help overcome this obstacle and support mechanistic elucidation of cellular state determinants, for example by using the VIPER algorithm to identify Master Regulator (MR) proteins from gene expression data. A key challenge, as the size and complexity of scRNASeq datasets grow, is the need for highly scalable tools supporting the analysis of large-scale datasets with up to hundreds of thousands of cells. To address it, we introduce pyVIPER, a fast, memory-efficient, and highly scalable Python toolkit for assessing protein activity in large-scale scRNASeq datasets. pyVIPER supports multiple enrichment analysis algorithms, data transformation/postprocessing modules, a novel data structure for GRNs manipulation, and seamless integration with AnnData, Scanpy and several widely adopted machine learning libraries. Compared to VIPER, benchmarking reveals orders of magnitude runtime reduction for large datasets—i.e., from hours to minutes— thus supporting VIPER-based analysis of virtually any large-scale single-cell dataset, as well as integration with other Python-based tools.
Availability and Implementation
pyVIPER is available on GitHub ( https://github.com/alevax/pyviper ) and PyPI ( https://pypi.org/project/viper-in-python/ ).
Contact
av2729@cumc.columbia.edu
Supplementary information
Supplementary data are available at Bioinformatics online. Accompanying data for the tutorials are available on Zenodo ( https://zenodo.org/records/10059791 ).