A short plus long-amplicon based sequencing approach improves genomic coverage and variant detection in the SARS-CoV-2 genome
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
High viral transmission in the COVID-19 pandemic has enabled SARS‐CoV‐2 to acquire new mutations that may impact genome sequencing methods. The ARTIC.v3 primer pool that amplifies short amplicons in a multiplex-PCR reaction is one of the most widely used methods for sequencing the SARS-CoV-2 genome. We observed that some genomic intervals are poorly captured with ARTIC primers. To improve the genomic coverage and variant detection across these intervals, we designed long amplicon primers and evaluated the performance of a short (ARTIC) plus long amplicon (MRL) sequencing approach. Sequencing assays were optimized on VR-1986D-ATCC RNA followed by sequencing of nasopharyngeal swab specimens from fifteen COVID-19 positive patients. ARTIC data covered 94.47% of the virus genome fraction in the positive control and patient samples. Variant analysis in the ARTIC data detected 217 mutations, including 209 single nucleotide variants (SNVs) and eight insertions & deletions. On the other hand, long-amplicon data detected 156 mutations, of which 80% were concordant with ARTIC data. Combined analysis of ARTIC + MRL data improved the genomic coverage to 97.03% and identified 214 high confidence mutations. The combined final set of 214 mutations included 203 SNVs, 8 deletions and 3 insertions. Analysis showed 26 SARS-CoV-2 lineage defining mutations including 4 known variants of concern K417N, E484K, N501Y, P618H in spike gene. Hybrid analysis identified 7 nonsynonymous and 5 synonymous mutations across the genome that were either ambiguous or not called in ARTIC data. For example, G172V mutation in the ORF3a protein and A2A mutation in Membrane protein were missed by the ARTIC assay. Thus, we show that while the short amplicon (ARTIC) assay provides good genomic coverage with high throughput, complementation of poorly captured intervals with long amplicon data can significantly improve SARS-CoV-2 genomic coverage and variant detection.
Article activity feed
-
-
SciScore for 10.1101/2021.06.16.21259029: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics IRB: Samples were de-identified and analyzed with a waiver from UT Southwestern Institutional Review Board. b. Primers: We designed our own set of primers that amplify long-amplicons spanning 1.5-2.5Kb of the SARS-CoV-2 viral genome. Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
Software and Algorithms Sentences Resources The third assay also used only on our own primer design (MRL Primer), but the long-amplicons were directly sequenced on the long-read sequencing platform, MinION, from Oxford Nanopore Technology (ONT). MinIONsuggested: (MinION, RRID:SCR_017985)The presence of host reads were detected using … SciScore for 10.1101/2021.06.16.21259029: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics IRB: Samples were de-identified and analyzed with a waiver from UT Southwestern Institutional Review Board. b. Primers: We designed our own set of primers that amplify long-amplicons spanning 1.5-2.5Kb of the SARS-CoV-2 viral genome. Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
Software and Algorithms Sentences Resources The third assay also used only on our own primer design (MRL Primer), but the long-amplicons were directly sequenced on the long-read sequencing platform, MinION, from Oxford Nanopore Technology (ONT). MinIONsuggested: (MinION, RRID:SCR_017985)The presence of host reads were detected using Kraken 2, the host genome version used was GRCh38. Krakensuggested: (Kraken, RRID:SCR_005484)Quast was used to summarize assembly statistics where the reference viral genome was NC_045512.2. Quastsuggested: (QUAST, RRID:SCR_001228)Fast5 files from Mk1C runs were base called and demultiplexed with guppy (GPU enabled), PycoQC, FastQC and NanoPlot were used for sequencing QC visualizations. FastQCsuggested: (FastQC, RRID:SCR_014583)Variants were called using Nanopolish where ploidy was set to 1. Nanopolishsuggested: (Nanopolish, RRID:SCR_016157)Combination of Illumina MiSeq paired end reads and Oxford Nanopore Mk1C long reads were assembled using SPAdes, hybrid assembly stats were summarized with Quast. SPAdessuggested: (SPAdes, RRID:SCR_000131)Results from OddPub: Thank you for sharing your data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-