SARS-CoV-2 Whole Genome Amplification and Sequencing for Effective Population-Based Surveillance and Control of Viral Transmission
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Background
With the gradual reopening of economies and resumption of social life, robust surveillance mechanisms should be implemented to control the ongoing COVID-19 pandemic. Unlike RT-qPCR, SARS-CoV-2 whole genome sequencing (cWGS) has the added advantage of identifying cryptic origins of the virus, and the extent of community-based transmissions versus new viral introductions, which can in turn influence public health policy decisions. However, the practical and cost considerations of cWGS should be addressed before it is widely implemented.
Methods
We performed shotgun transcriptome sequencing using RNA extracted from nasopharyngeal swabs of patients with COVID-19, and compared it to targeted SARS-CoV-2 genome amplification and sequencing with respect to virus detection, scalability, and cost-effectiveness. To track virus origin, we used open-source multiple sequence alignment and phylogenetic tools to compare the assembled SARS-CoV-2 genomes to publicly available sequences.
Results
We found considerable improvement in whole genome sequencing data quality and viral detection using amplicon-based target enrichment of SARS-CoV-2. With enrichment, more than 99% of the sequencing reads mapped to the viral genome, compared to an average of 0.63% without enrichment. Consequently, an increase in genome coverage was obtained using substantially less sequencing data, enabling higher scalability and sizable cost reductions. We also demonstrated how SARS-CoV-2 genome sequences can be used to determine their possible origin through phylogenetic analysis including other viral strains.
Conclusions
SARS-CoV-2 whole genome sequencing is a practical, cost-effective, and powerful approach for population-based surveillance and control of viral transmission in the next phase of the COVID-19 pandemic.
Article activity feed
-
-
-
SciScore for 10.1101/2020.06.06.138339: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement IRB: This study was approved by the Dubai Scientific Research Ethics Committee - Dubai Health Authority (approval number #DSREC-04/2020_02). Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources Indexed libraries from multiple patients were pooled and sequenced (2 × 150 cycles) using the MiSeq or the NovaSeq systems (Illumina, San Diego, CA, USA). MiSeqsuggested: (A5-miseq, RRID:SCR_012148)Bioinformatics analysis and SARS-CoV-2 genome assembly: Demultiplexed Fastq reads, obtained through shotgun or target enrichment sequencing, were generated … SciScore for 10.1101/2020.06.06.138339: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement IRB: This study was approved by the Dubai Scientific Research Ethics Committee - Dubai Health Authority (approval number #DSREC-04/2020_02). Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources Indexed libraries from multiple patients were pooled and sequenced (2 × 150 cycles) using the MiSeq or the NovaSeq systems (Illumina, San Diego, CA, USA). MiSeqsuggested: (A5-miseq, RRID:SCR_012148)Bioinformatics analysis and SARS-CoV-2 genome assembly: Demultiplexed Fastq reads, obtained through shotgun or target enrichment sequencing, were generated from raw sequencing base call files using BCL2Fastq v2.20.0, and then mapped to the reference Wuhan genome (GenBank accession number: NC_045512.2) by Burrow-Wheeler Aligner, BWA v0.7.17. BCL2Fastqsuggested: (bcl2fastq , RRID:SCR_015058)BWAsuggested: (BWA, RRID:SCR_010910)Alignment statistics, such as coverage and mapped reads, were generated using Picard 2.18.17. Picardsuggested: (Picard, RRID:SCR_006525)Variant calling was performed by GATK v3.8-1-0, and was followed by SARS-CoV-2 genome assembly using BCFtools v. GATKsuggested: (GATK, RRID:SCR_001876)Phylogenetic analysis: We used Nexstrain (9), which consists of Augur v6.4.3 pipeline for multiple sequence alignment (MAFFT v7.455) (10) and phylogenetic tree construction (IQtree v1.6.12) (11). Augursuggested: NoneMAFFTsuggested: (MAFFT, RRID:SCR_011811)Tree visualization and annotations were performed in FigTree v1.4.4 (12). FigTreesuggested: (FigTree, RRID:SCR_008515)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:One possible limitation is the use of ultra-sonication for fragmentation of PCR products after SARS-CoV-2 whole genome amplification. Several labs might lack sonication systems due to accessibility and affordability issues. In such situations, our protocol can be easily modified to use enzymatic fragmentation instead provided by commercial kits, such as the Agilent SureSelectQXT kit. Furthermore, we have added M13 tails to all our primer sets making them amenable to Sanger sequencing for those labs not equipped with NGS. However, with this approach, manual analysis of sequencing data limits scalability of the approach. Upon sequence generation, the bioinformatics analysis can be performed using open source scripts. Labs without bioinformatics expertise or support can use online tools (INSaFlu: https://insaflu.insa.pt/; Genome Detective: https://www.genomedetective.com/app/typingtool/virus/) (7,8) which can take raw sequencing (Fastq) files to assemble viral genomes, and to perform multiple sequence alignment and phylogenetic analysis for virus origin tracking. In addition, the described approach does not require significant data storage or computational investment as shown by our cost, data, and scalability calculations (Table 3). Our phylogenetic analysis demonstrates how SARS-CoV-2 genomic sequencing can be used to track origins of virus transmission. However, data should be carefully interpreted, and should be combined with other epidemiological information (such as travel...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- No funding statement was detected.
- No protocol registration statement was detected.
-