Evaluation of NGS-based approaches for SARS-CoV-2 whole genome characterisation

This article has been Reviewed by the following groups

Read the full article

Abstract

Since the beginning of the COVID-19 outbreak, SARS-CoV-2 whole-genome sequencing (WGS) has been performed at unprecedented rate worldwide with the use of very diverse Next Generation Sequencing (NGS) methods. Herein, we compare the performance of four NGS-based approaches for SARS-CoV-2 WGS. Twenty four clinical respiratory samples with a large scale of Ct values (from 10.7 to 33.9) were sequenced with four methods. Three used Illumina sequencing: an in-house metagenomic NGS (mNGS) protocol and two newly commercialized kits including a hybridization capture method developed by Illumina (DNA Prep with Enrichment kit and Respiratory Virus Oligo Panel, RVOP) and an amplicon sequencing method developed by Paragon Genomics (CleanPlex SARS-CoV-2 kit). We also evaluated the widely used amplicon sequencing protocol developed by ARTIC Network and combined with Oxford Nanopore Technologies (ONT) sequencing. All four methods yielded near-complete genomes (>99%) for high viral loads samples, with mNGS and RVOP producing the most complete genomes. For mid viral loads, 2/8 and 1/8 genomes were incomplete (<99%) with mNGS and both CleanPlex and RVOP, respectively. For low viral loads (Ct ≥25), amplicon-based enrichment methods were the most sensitive techniques yielding complete genomes for 7/8 samples. All methods were highly concordant in terms of identity in complete consensus sequence. Just one mismatch in two samples was observed in CleanPlex vs the other methods, due to the dedicated bioinformatics pipeline setting a high threshold to call SNP compared to reference sequence. Importantly, all methods correctly identified a newly observed 34-nt deletion in ORF6 but required specific bioinformatic validation for RVOP. Finally, as a major warning for targeted techniques, a default of coverage in any given region of the genome should alert to a potential rearrangement or a SNP in primer annealing or probe-hybridizing regions and would require regular updates of the technique according to SARS-CoV-2 evolution.

Article activity feed

  1. SciScore for 10.1101/2020.07.14.201947: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Briefly, low quality and human reads were filtered out (dehosting) and remaining reads were aligned to the SARS-CoV-2 reference genome (isolate Wuhan-Hu-1, EPI_ISL_402125) using the BWA-MEM (v0.7.15-r1140) algorithm.
    BWA-MEM
    suggested: (Sniffles, RRID:SCR_017619)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The present study does have several limitations. First, the 4 methods were tested on a limited number of European samples, all affiliated to the lineage B1, according to the Pangolin classification (23); the present evaluation should be confirmed in larger studies including other lineages in order to be more representative of the global ecology of this emerging virus. As SARS-CoV-2 sequencing is an expanding market and other methods has been published (24–26) or are probably under development, the present study is not exhaustive. However, the approaches selected herein are representative of the methods mostly used during pandemics and are easy to implement in a diagnostic laboratory. As mentioned above, dedicated bioinformatic processes have been implemented as recommended by suppliers. Except for RVOP for which we compared the aligner using in DRAGEN and minimap2, we chose not to perform a comparison of the bioinformatic tools. In this context, further bioinformatic evaluations are needed especially concerning parameters such as the variant frequency threshold for calling SNP compared to reference sequence, as well as the minimum depth for calling consensus. Additionally, de novo assembly should be evaluated as it may provide better assessment of rearrangements such as large deletions, insertions and duplications that can be missed by classical alignment analyses, but this method is time-consuming and may not be appropriate for genomes with low coverage. More broadly, a repe...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.