ViralFlow: A Versatile Automated Workflow for SARS-CoV-2 Genome Assembly, Lineage Assignment, Mutations and Intrahost Variant Detection

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The COVID-19 pandemic is driven by Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) that emerged in 2019 and quickly spread worldwide. Genomic surveillance has become the gold standard methodology used to monitor and study this fast-spreading virus and its constantly emerging lineages. The current deluge of SARS-CoV-2 genomic data generated worldwide has put additional pressure on the urgent need for streamlined bioinformatics workflows. Here, we describe a workflow developed by our group to process and analyze large-scale SARS-CoV-2 Illumina amplicon sequencing data. This workflow automates all steps of SARS-CoV-2 reference-based genomic analysis: data processing, genome assembly, PANGO lineage assignment, mutation analysis and the screening of intrahost variants. The pipeline is capable of processing a batch of around 100 samples in less than half an hour on a personal laptop or in less than five minutes on a server with 50 threads. The workflow presented here is available through Docker or Singularity images, allowing for implementation on laptops for small-scale analyses or on high processing capacity servers or clusters. Moreover, the low requirements for memory and CPU cores and the standardized results provided by ViralFlow highlight it as a versatile tool for SARS-CoV-2 genomic analysis.

Article activity feed

  1. SciScore for 10.1101/2021.10.01.21264424: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Ethicsnot detected.
    Sex as a biological variablenot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    0.20.1 tool (Chen et al., 2018), where the paired-end reads are trimmed using a minimum read quality threshold of Phred score 20.
    Phred
    suggested: (Phred, RRID:SCR_001017)
    In this step a mapping of paired-end libraries against a reference genome is performed with BWA v.
    BWA
    suggested: (BWA, RRID:SCR_010910)
    The Samtools v.1.9 (Li et al., 2009) and iVar v.
    Samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.