High throughput detection and genetic epidemiology of SARS-CoV-2 using COVIDSeq next-generation sequencing

This article has been Reviewed by the following groups

Read the full article

Abstract

The rapid emergence of coronavirus disease 2019 (COVID-19) as a global pandemic affecting millions of individuals globally has necessitated sensitive and high-throughput approaches for the diagnosis, surveillance, and determining the genetic epidemiology of SARS-CoV-2. In the present study, we used the COVIDSeq protocol, which involves multiplex-PCR, barcoding, and sequencing of samples for high-throughput detection and deciphering the genetic epidemiology of SARS-CoV-2. We used the approach on 752 clinical samples in duplicates, amounting to a total of 1536 samples which could be sequenced on a single S4 sequencing flow cell on NovaSeq 6000. Our analysis suggests a high concordance between technical duplicates and a high concordance of detection of SARS-CoV-2 between the COVIDSeq as well as RT-PCR approaches. An in-depth analysis revealed a total of six samples in which COVIDSeq detected SARS-CoV-2 in high confidence which were negative in RT-PCR. Additionally, the assay could detect SARS-CoV-2 in 21 samples and 16 samples which were classified inconclusive and pan-sarbeco positive respectively suggesting that COVIDSeq could be used as a confirmatory test. The sequencing approach also enabled insights into the evolution and genetic epidemiology of the SARS-CoV-2 samples. The samples were classified into a total of 3 clades. This study reports two lineages B.1.112 and B.1.99 for the first time in India. This study also revealed 1,143 unique single nucleotide variants and added a total of 73 novel variants identified for the first time. To the best of our knowledge, this is the first report of the COVIDSeq approach for detection and genetic epidemiology of SARS-CoV-2. Our analysis suggests that COVIDSeq could be a potential high sensitivity assay for the detection of SARS-CoV-2, with an additional advantage of enabling the genetic epidemiology of SARS-CoV-2.

Article activity feed

  1. SciScore for 10.1101/2020.08.10.242677: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board StatementIRB: Patients and Samples: The study was approved by the Institutional Human Ethics Committee (IHEC No. Dated CSIR-IGIB/IHEC/2020-21/01).
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    This included demultiplexing the raw data to FASTQ files using bcl2fastq (v2.20) followed by quality assessment of the FASTQ files using Trimmomatic (v0.39) (Bolger, Lohse and Usadel, 2014).
    bcl2fastq
    suggested: (bcl2fastq , RRID:SCR_015058)
    Trimmomatic
    suggested: (Trimmomatic, RRID:SCR_011848)
    Variant calling was performed using VarScan (v2.4.4) for samples with genome coverage greater than 99% (Koboldt et al., 2009).
    VarScan
    suggested: (VARSCAN, RRID:SCR_006849)
    Samtools (v 1.10) (Li et al., 2009), bcftools (v 1.10.2), and seqtk (version 1.3-r114) (Shen et al., 2016) were used to generate the consensus sequence.
    Samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    Annotation of Genetic Variants and Comparison with existing datasets: The variants were systematically annotated using ANNOVAR (Wang, Li and Hakonarson, 2010).
    ANNOVAR
    suggested: (ANNOVAR, RRID:SCR_012821)
    Annotations on genomic loci and functional consequences of the protein were retrieved from RefSeq.
    RefSeq
    suggested: (RefSeq, RRID:SCR_003496)
    The genome sequences were aligned using MAFFT to the reference genome and problematic variant positions were masked (Katoh and Toh, 2008).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    We compared the sample type (e.g. positive, pan-sarbeco, inconclusive and negative) WGS output and calculated percent of genome covered, sensitivity, specificity, accuracy, precision and gain of detection rate.
    WGS
    suggested: None

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

  2. SciScore for 10.1101/2020.08.10.242677: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board StatementMaterials and Methods Patients and Samples The study was approved by the Institutional Human Ethics Committee (IHEC No. Dated CSIR-IGIB/IHEC/2020-21/01).Randomizationnot detected.Blindingnot detected.Power Analysisnot detected.Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    This included demultiplexing the raw data to FASTQ files using bcl2fastq (v2.20) followed by quality assessment of the FASTQ files using Trimmomatic (v0.39) (Bolger, Lohse and Usadel, 2014).
    bcl2fastq
    suggested: (bcl2fastq , RRID:SCR_015058)
    Trimmomatic
    suggested: (Trimmomatic, RRID:SCR_011848)
    Variant calling was performed using VarScan (v2.4.4) for samples with genome coverage greater than 99% (Koboldt et al., 2009).
    VarScan
    suggested: (VARSCAN, RRID:SCR_006849)
    Samtools (v 1.10) (Li et al., 2009), bcftools (v 1.10.2), and seqtk (version 1.3-r114) (Shen et al., 2016) were used to generate the consensus sequence.
    Samtools
    suggested: (Samtools, RRID:SCR_002105)
    Annotation of Genetic Variants and Comparison with existing datasets The variants were systematically annotated using ANNOVAR (Wang, Li and Hakonarson, 2010).
    ANNOVAR
    suggested: (ANNOVAR, RRID:SCR_012821)
    Annotations on genomic loci and functional consequences of the protein were retrieved from RefSeq.
    RefSeq
    suggested: (RefSeq, RRID:SCR_003496)
    The genome sequences were aligned using MAFFT to the reference genome and problematic variant positions were masked (Katoh and Toh, 2008).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Data availability Raw datasets are available at NCBI short Read Archive with Project ID PRJNA655577.
    NCBI short Read Archive
    suggested: None

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.