SARS-CoV-2 RECoVERY: a multi-platform open-source bioinformatic pipeline for the automatic construction and analysis of SARS-CoV-2 genomes from NGS sequencing data
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
- Evaluated articles (Rapid Reviews Infectious Diseases)
Abstract
Background
Since its first appearance in December 2019, the novel Severe Acute Respiratory Syndrome Coronavirus type 2 (SARS-CoV-2), spread worldwide causing an increasing number of cases and deaths (35,537,491 and 1,042,798, respectively at the time of writing, https://covid19.who.int ). Similarly, the number of complete viral genome sequences produced by Next Generation Sequencing (NGS), increased exponentially. NGS enables a rapid accumulation of a large number of sequences. However, bioinformatics analyses are critical and require combined approaches for data analysis, which can be challenging for non-bioinformaticians.
Results
A user-friendly and sequencing platform-independent bioinformatics pipeline, named SARS-CoV-2 RECoVERY (REconstruction of CoronaVirus gEnomes & Rapid analYsis) has been developed to build SARS-CoV-2 complete genomes from raw sequencing reads and to investigate variants. The genomes built by SARS-CoV-2 RECoVERY were compared with those obtained using other software available and revealed comparable or better performances of SARS–CoV2 RECoVERY. Depending on the number of reads, the complete genome reconstruction and variants analysis can be achieved in less than one hour. The pipeline was implemented in the multi-usage open-source Galaxy platform allowing an easy access to the software and providing computational and storage resources to the community.
Conclusions
SARS-CoV-2 RECoVERY is a piece of software destined to the scientific community working on SARS-CoV-2 phylogeny and molecular characterisation, providing a performant tool for the complete reconstruction and variants’ analysis of the viral genome. Additionally, the simple software interface and the ability to use it through a Galaxy instance without the need to implement computing and storage infrastructures, make SARS-CoV-2 RECoVERY a resource also for virologists with little or no bioinformatics skills.
Availability and implementation
The pipeline SARS-CoV-2 RECoVERY (REconstruction of COronaVirus gEnomes & Rapid analYsis) is implemented in the Galaxy instance ARIES ( https://aries.iss.it ).
Article activity feed
-
Martin Höelzer
Review 1: "SARS-CoV-2 RECoVERY: a multi-platform open-source bioinformatic pipeline for the automatic construction and analysis of SARS-CoV-2 genomes from NGS sequencing data"
Reviewer: Martin Höelzer (Robert Koch Institute) 📒📒📒 ◻️◻️
-
Martin Höelzer
Review of "SARS-CoV-2 RECoVERY: a multi-platform open-source bioinformatic pipeline for the automatic construction and analysis of SARS-CoV-2 genomes from NGS sequencing data"
Reviewer: Martin Höelzer (Robert Koch Institute) 📒📒📒 ◻️◻️
-
-
SciScore for 10.1101/2021.01.16.425365: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Read quality analysis and trimming: The reads imported in fastq format are trimmed with the Trimmomatic tool (Bolger et al., 2014) to remove the low-quality bases (or N bases) from both terminus of each read and to exclude reads shorter than 30 base pairs (bp). Trimmomaticsuggested: (Trimmomatic, RRID:SCR_011848)Subtraction of human sequences: Trimmed reads are mapped using Bowtie2 software (Langmead et al., 2012) onto the reference human genome downloaded by “The Genome Reference Consortium” database (https://www.ncbi.nlm.nih.gov/grc) to remove the human genomic sequences Genome … SciScore for 10.1101/2021.01.16.425365: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Read quality analysis and trimming: The reads imported in fastq format are trimmed with the Trimmomatic tool (Bolger et al., 2014) to remove the low-quality bases (or N bases) from both terminus of each read and to exclude reads shorter than 30 base pairs (bp). Trimmomaticsuggested: (Trimmomatic, RRID:SCR_011848)Subtraction of human sequences: Trimmed reads are mapped using Bowtie2 software (Langmead et al., 2012) onto the reference human genome downloaded by “The Genome Reference Consortium” database (https://www.ncbi.nlm.nih.gov/grc) to remove the human genomic sequences Genome reconstruction: The recovered unaligned reads are mapped onto the reference sequence of SARS-CoV-2 using the software Bowtie2, for Illumina and Ion Torrent reads, and Minimap2 (Li, 2018) for Nanopore reads. Bowtie2suggested: (Bowtie 2, RRID:SCR_016368)Coverage analysis: The coverage analysis and nucleotide distribution are performed using the tool Qualimap 2 (Okonechnikov et al., 2016). Qualimapsuggested: (QualiMap, RRID:SCR_001209)ORF annotation: Annotation is performed with the BLASTn tool (Megablast) using the SARS-CoV-2 reference ORFs (Open Reading Frame). BLASTnsuggested: (BLASTN, RRID:SCR_001598)The SnpEff tool (Cingolani et al., 2012) is eventually used for the variants’ annotation, using the reference genome of SARS-CoV-2 and the iVar output (tsv) converted in vcf file format. SnpEffsuggested: (SnpEff, RRID:SCR_005191)Performance of the pipeline in comparison with other software: One hundred NGS raw data from Illumina, 100 from Nanopore and 50 from Ion Torrent platforms, were downloaded from the NCBI database Sequence Read Archive (SRA). NCBI database Sequence Read Archivesuggested: NoneResults from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- No funding statement was detected.
- No protocol registration statement was detected.
-