SARS-CoV-2 Whole Genome Amplification and Sequencing for Effective Population-Based Surveillance and Control of Viral Transmission

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

With the gradual reopening of economies and resumption of social life, robust surveillance mechanisms should be implemented to control the ongoing COVID-19 pandemic. Unlike RT-qPCR, SARS-CoV-2 whole genome sequencing (cWGS) has the added advantage of identifying cryptic origins of the virus, and the extent of community-based transmissions versus new viral introductions, which can in turn influence public health policy decisions. However, the practical and cost considerations of cWGS should be addressed before it is widely implemented.

Methods

We performed shotgun transcriptome sequencing using RNA extracted from nasopharyngeal swabs of patients with COVID-19, and compared it to targeted SARS-CoV-2 genome amplification and sequencing with respect to virus detection, scalability, and cost-effectiveness. To track virus origin, we used open-source multiple sequence alignment and phylogenetic tools to compare the assembled SARS-CoV-2 genomes to publicly available sequences.

Results

We found considerable improvement in whole genome sequencing data quality and viral detection using amplicon-based target enrichment of SARS-CoV-2. With enrichment, more than 99% of the sequencing reads mapped to the viral genome, compared to an average of 0.63% without enrichment. Consequently, an increase in genome coverage was obtained using substantially less sequencing data, enabling higher scalability and sizable cost reductions. We also demonstrated how SARS-CoV-2 genome sequences can be used to determine their possible origin through phylogenetic analysis including other viral strains.

Conclusions

SARS-CoV-2 whole genome sequencing is a practical, cost-effective, and powerful approach for population-based surveillance and control of viral transmission in the next phase of the COVID-19 pandemic.

Article activity feed

  1. SciScore for 10.1101/2020.06.06.138339: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board StatementIRB: This study was approved by the Dubai Scientific Research Ethics Committee - Dubai Health Authority (approval number #DSREC-04/2020_02).
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Indexed libraries from multiple patients were pooled and sequenced (2 × 150 cycles) using the MiSeq or the NovaSeq systems (Illumina, San Diego, CA, USA).
    MiSeq
    suggested: (A5-miseq, RRID:SCR_012148)
    Bioinformatics analysis and SARS-CoV-2 genome assembly: Demultiplexed Fastq reads, obtained through shotgun or target enrichment sequencing, were generated from raw sequencing base call files using BCL2Fastq v2.20.0, and then mapped to the reference Wuhan genome (GenBank accession number: NC_045512.2) by Burrow-Wheeler Aligner, BWA v0.7.17.
    BCL2Fastq
    suggested: (bcl2fastq , RRID:SCR_015058)
    BWA
    suggested: (BWA, RRID:SCR_010910)
    Alignment statistics, such as coverage and mapped reads, were generated using Picard 2.18.17.
    Picard
    suggested: (Picard, RRID:SCR_006525)
    Variant calling was performed by GATK v3.8-1-0, and was followed by SARS-CoV-2 genome assembly using BCFtools v.
    GATK
    suggested: (GATK, RRID:SCR_001876)
    Phylogenetic analysis: We used Nexstrain (9), which consists of Augur v6.4.3 pipeline for multiple sequence alignment (MAFFT v7.455) (10) and phylogenetic tree construction (IQtree v1.6.12) (11).
    Augur
    suggested: None
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Tree visualization and annotations were performed in FigTree v1.4.4 (12).
    FigTree
    suggested: (FigTree, RRID:SCR_008515)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    One possible limitation is the use of ultra-sonication for fragmentation of PCR products after SARS-CoV-2 whole genome amplification. Several labs might lack sonication systems due to accessibility and affordability issues. In such situations, our protocol can be easily modified to use enzymatic fragmentation instead provided by commercial kits, such as the Agilent SureSelectQXT kit. Furthermore, we have added M13 tails to all our primer sets making them amenable to Sanger sequencing for those labs not equipped with NGS. However, with this approach, manual analysis of sequencing data limits scalability of the approach. Upon sequence generation, the bioinformatics analysis can be performed using open source scripts. Labs without bioinformatics expertise or support can use online tools (INSaFlu: https://insaflu.insa.pt/; Genome Detective: https://www.genomedetective.com/app/typingtool/virus/) (7,8) which can take raw sequencing (Fastq) files to assemble viral genomes, and to perform multiple sequence alignment and phylogenetic analysis for virus origin tracking. In addition, the described approach does not require significant data storage or computational investment as shown by our cost, data, and scalability calculations (Table 3). Our phylogenetic analysis demonstrates how SARS-CoV-2 genomic sequencing can be used to track origins of virus transmission. However, data should be carefully interpreted, and should be combined with other epidemiological information (such as travel...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.