The impact of package selection and versioning on single-cell RNA-seq analysis

Joseph M Rich
Lambda Moses
Pétur Helgi Einarsson
Kayla Jackson
Laura Luebbert
A. Sina Booeshaghi
Sindri Antonsson
Delaney K. Sullivan
Nicolas Bray
Páll Melsted
Lior Pachter

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Standard single-cell RNA-sequencing analysis (scRNA-seq) workflows consist of converting raw read data into cell-gene count matrices through sequence alignment, followed by analyses including filtering, highly variable gene selection, dimensionality reduction, clustering, and differential expression analysis. Seurat and Scanpy are the most widely-used packages implementing such workflows, and are generally thought to implement individual steps similarly. We investigate in detail the algorithms and methods underlying Seurat and Scanpy and find that there are, in fact, considerable differences in the outputs of Seurat and Scanpy. The extent of differences between the programs is approximately equivalent to the variability that would be introduced in benchmarking scRNA-seq datasets by sequencing less than 5% of the reads or analyzing less than 20% of the cell population. Additionally, distinct versions of Seurat and Scanpy can produce very different results, especially during parts of differential expression analysis. Our analysis highlights the need for users of scRNA-seq to carefully assess the tools on which they rely, and the importance of developers of scientific software to prioritize transparency, consistency, and reproducibility for their tools.

Version published to 10.1101/2024.04.04.588111 on bioRxiv
Apr 5, 2024

Discuss this preprint

Listed in

Abstract

Article activity feed