Scalable high-performance single cell data analysis with BPCells

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The growth of single-cell datasets to multi-million cell atlases has uncovered major scalability problems for single-cell analysis software. Here, we present BPCells, a package for high-performance single-cell analysis of RNA-seq and ATAC-seq datasets. BPCells uses disk-backed streaming compute algorithms to reduce memory requirements by nearly 70-fold compared to in-memory workflows with little to no loss of execution speed. BPCells also introduces high-performance compressed formats based on bitpacking compression for ATAC-seq fragment files and single-cell sparse matrices. These novel compression algorithms help to accelerate disk-backed analysis by reducing data transfer from disk, while providing the lowest computational overhead of all compression algorithms tested. Using BPCells, we perform normalization and PCA of a 44 million cell dataset on a laptop, demonstrating that BPCells makes working with the largest contemporary single-cell datasets feasible on modest hardware, while leaving headroom on servers for future datasets an order of magnitude larger.

Article activity feed