ScaleSC: A superfast and scalable single cell RNA-seq data analysis pipeline powered by GPU
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rise of large-scale single-cell RNA-seq data has introduced challenges in data processing due to its slow speed. Leveraging advancements in GPU computing ecosystems, such as CuPy, and building on Scanpy and rapids-singlecell package, we developed ScaleSC, a GPU-accelerated solution for single-cell data processing. ScaleSC delivers over a 20x speedup through GPU computing and significantly improves scalability, handling datasets of 10–40 million cells with over 1000 batches by overcoming the memory bottleneck on a single A100 card- far surpassing rapids-singlecell’s capacity of processing only 1 million cells without multi-GPU support. We also resolved discrepancies between GPU and CPU algorithm implementations to ensure consistency. In addition to core optimizations, we developed new advanced tools for marker gene identification, cluster merging, and more, with GPU-optimized implementations seamlessly integrated. Designed for ease of use, the ScaleSC package is compatible with Scanpy workflows, requiring minimal adaptation from users. The ScaleSC package ( https://github.com/interactivereport/ScaleSC ) promises significant benefits for the single-cell RNA-seq computational community.