Enabling reproducible and reusable genetic demultiplexing benchmarking with Nextflow and Apptainer
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Reproducibility, reusability, and portability were identified as key requirements for analysis pipelines in genomics. A previous study identified that less than 4% of notebooks used for biomedical research were fully reproducible. Reusable and portable software are required to ensure compatibility with federated and cloud computing. The field requires computational workflows that meet high standards for reproducibility, reusability, and portability to allow benchmarking to be repeated and updated periodically. This is of particular importance in areas where new methods are frequently released, such as demultiplexing scRNA-seq, a critical early step in most scRNA-seq analysis pipelines.
To address this, we developed demux_bench for benchmarking genetic demultiplexing methods in single-cell RNA sequencing, which meets the gold standard for reproducibility of computational workflows, and incorporates best practices for reusability and portability. We used workflow manager Nextflow to enable the simulation and testing of various benchmarking scenarios and methods in parallel from a single pipeline execution. Different experimental configurations can be simulated, including % doublets and class size imbalance. The pipeline includes genotype-free methods, Vireo and souporcell, and additional existing or novel methods can be added modularly. Demux_bench is configurable to reproduce specific analyses and generalisable to address new research questions. Software dependencies are handled through containers via Apptainer, allowing portability to different compute environments and avoiding the need for manual installation of software. Demux_bench is available on WorkflowHub ( https://workflowhub.eu/workflows/1769 ), and can also be run on Galaxy and other platforms through RO-Crates.
Demux_bench facilitates gold standard benchmarking for genetic demultiplexing through reproducibility, reusability, scalability, and portability.