V-pipe 3.0: a sustainable pipeline for within-sample viral genetic diversity estimation
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (GigaScience)
- Evaluated articles (GigaByte)
Abstract
The large amount and diversity of viral genomic datasets generated by next-generation sequencing technologies poses a set of challenges for computational data analysis workflows, including rigorous quality control, adaptation to higher sample coverage, and tailored steps for specific applications. Here, we present V-pipe 3.0, a computational pipeline designed for analyzing next-generation sequencing data of short viral genomes. It is developed to enable reproducible, scalable, adaptable, and transparent inference of genetic diversity of viral samples. By presenting two large-scale data analysis projects, we demonstrate the effectiveness of V-pipe 3.0 in supporting sustainable viral genomic data science.
Article activity feed
-
The large amount and diversity of viral genomic datasets generated by next-generation sequencing technologies poses a set of challenges for computational data analysis workflows, including rigorous quality control, adaptation to higher sample coverage, and tailored steps for specific applications. Here, we present V-pipe 3.0, a computational pipeline designed for analyzing next-generation sequencing data of short viral genomes. It is developed to enable reproducible, scalable, adaptable, and transparent inference of genetic diversity of viral samples. By presenting two large-scale data analysis projects, we demonstrate the effectiveness of V-pipe 3.0 in supporting sustainable viral genomic data science.Competing Interest StatementThe authors have declared no competing interest.
This work has been peer reviewed in GigaScience (see https…
The large amount and diversity of viral genomic datasets generated by next-generation sequencing technologies poses a set of challenges for computational data analysis workflows, including rigorous quality control, adaptation to higher sample coverage, and tailored steps for specific applications. Here, we present V-pipe 3.0, a computational pipeline designed for analyzing next-generation sequencing data of short viral genomes. It is developed to enable reproducible, scalable, adaptable, and transparent inference of genetic diversity of viral samples. By presenting two large-scale data analysis projects, we demonstrate the effectiveness of V-pipe 3.0 in supporting sustainable viral genomic data science.Competing Interest StatementThe authors have declared no competing interest.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giae065), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license.
Reviewer: Shilpa Garg
V-pipe 3.0 is introduced as an advanced computational pipeline tailored for the analysis of nextgeneration sequencing data from short viral genomes. Designed to meet the challenges posed by the vast and diverse datasets generated by these technologies, V-pipe 3.0 emphasizes reproducibility, scalability, adaptability, and transparency. It achieves this by adhering to Snakemake's best practices, allowing easy swapping of virus-specific configuration files, and providing thoroughly tested examples online.
The utility of V-pipe 3.0 is showcased through its application in two extensive data analysis projects, proving its efficacy in sustainable viral genomic data science. Central to V-pipe 3.0 is its capacity for estimating viral diversity from sequencing data. A versatile benchmarking module has been developed to continuously assess various diversity estimation methods, accommodating the rapid advancements within this field. The pipeline simplifies the inclusion of new tools and datasets, supporting both synthetic and real experimental data. However, challenges in global haplotype reconstruction highlight the need for scalable methods that can accurately reflect the complex population structures of viruses and manage the uncertainties in the results.
Some additional clarification in the manuscript would be appreciated.
- I'm curious about how the efficiency is attained.
- Is it possible to utilize V-pipe for analyzing other microorganisms?
- The authors might consider directing readers to the following review article for reference: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02328-9 4) Identifying specific genes or genome regions with high polymorphism across different populations would be fascinating. How does V-pipe handle analysis in these highly variable regions?
-
AbstractThe large amount and diversity of viral genomic datasets generated by next-generation sequencing technologies poses a set of challenges for computational data analysis workflows, including rigorous quality control, adaptation to higher sample coverage, and tailored steps for specific applications. Here, we present V-pipe 3.0, a computational pipeline designed for analyzing next-generation sequencing data of short viral genomes. It is developed to enable reproducible, scalable, adaptable, and transparent inference of genetic diversity of viral samples. By presenting two large-scale data analysis projects, we demonstrate the effectiveness of V-pipe 3.0 in supporting sustainable viral genomic data science.Competing Interest StatementThe authors have declared no competing interest.
This work has been peer reviewed in GigaScience …
AbstractThe large amount and diversity of viral genomic datasets generated by next-generation sequencing technologies poses a set of challenges for computational data analysis workflows, including rigorous quality control, adaptation to higher sample coverage, and tailored steps for specific applications. Here, we present V-pipe 3.0, a computational pipeline designed for analyzing next-generation sequencing data of short viral genomes. It is developed to enable reproducible, scalable, adaptable, and transparent inference of genetic diversity of viral samples. By presenting two large-scale data analysis projects, we demonstrate the effectiveness of V-pipe 3.0 in supporting sustainable viral genomic data science.Competing Interest StatementThe authors have declared no competing interest.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giae065), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license.
Reviewer: Fotis Psomopoulos
The manuscript showcases a computational pipeline designed for analyzing next generation sequencing data of short viral genomes, namely V-pipe 3.0. After an overview of the challenge the tool is addressing, i.e. the necessity of continuous benchmarking of various methods due to their diverse performance across different scenarios,the paper continues with a detailed listing of the results, highlighting the key elements of Reproducibility, Scalability, Adaptability and Transparency.
The next section provides some details on the three applications / demonstrations of V-Pipe 3.0, i.e the Swiss SARS-CoV-2 Sequencing Consortium, the Swiss surveillance of SARS-CoV-2 genomic variants in wastewater and the Global haplotype reconstruction benchmark. This is followed by a comprehensive comparison of V-Pipe 3.0 το other relevant viral bioinformatics pipelines for within sample diversity estimation, focusing on functionalities and sustainability, and specifically nf-core/viralrecon, HAPHPIPE and ViralFlow, as well as a section discussing the main advantages of V-Pipe 3.0 as well as the rationale for some of the identified drawbacks.
The paper concludes with a thorough description of the underlying methods of V-Pipe 3.0 as well as on the data used. Overall the paper gives a very good presentation of V-Pipe, and makes a strong case about its use and value in a real-world challenge. An overall comment is that there is some confusion on the role of V-Pipe 3.0 as a workflow - i.e. whether it's a dynamic system that uses different tools per step based on user input, or if it's an automated systems that benchmarks the analysis using (e.g.) synthetic data as the baseline. In either case, there are also a few unclear points in the manuscript itself that could be further improved.
Specifically: -- It is not clear how V-pipe 3.0 differs from V-pipe. Although there is an indication of significant differences, an overview of the new features implemented in this version and/or a small introductory paragraph would be useful. -- In the "Results" section, lines 130 - 225 appear to refer to the implemented methodology and might be better served as part of the "Methods" section -- In the "Results" section, lines 135 - 138 implied that GitHub Actions are used to ensure Reproducibility of the workflow. Some more elaboration on this would be very useful, as GitHub actions are commonly used to automate processes (such as testing, conflict resolution etc). In particular, an reproducibility issue that might not be resolvable by GitHub actions are dependency conflicts that are specific to the particular system that is being tested. -- In the "Results" section, lines 139 - 146, it's not clear how the benchmark study contributes to the overall reproducibility of V-pipe 3.0. Some further explanation of the rationale would be very useful here. -- In the "Results" section, lines 179 - 183, it is not clear how Git and GitHub ensure adaptability of any new features that are implemented. Usually a version control system/automation system, can facilitate the integration of new features, but it's not readily evident how it supports/ensures/facilitates adaptability. Maybe a definition of "adaptability" in this particular context could also help. -- In the "Applications" section, it is not clear which version of V-Pipe was used for the overall analysis (V-pipe or V-pipe 3.0), especially in the wastewater use case. -- In the section "Comparison to other workflows" it is not very clear which tools are implemented within V-pipe 3.0, which differences there are with previous version (V-pipe) and how these differ to other pipelines. A table that is summarizing these details and highlighting the differences would be very useful here. Moreover, there are a few minor points that would enhance the readers' understanding: -- (minor) In the Section "2.1 Reproducibility", it's mentioned that all software dependencies are defined in Conda environments, making V-pipe 3.0 portable between different computing platforms. Is there a particular reason why V-Pipe itself isn't implemented as a conda package directly? -- (minor) More often that not, the pandemic is named as COVID19, in contrast to the virus that is named "SARS-CoV-2". It may be useful to amend/update the references to the "SARS-CoV-2 pandemic" accordingly.
-
-
-