Initial Insights Into the Genetic Epidemiology of SARS-CoV-2 Isolates From Kerala Suggest Local Spread From Limited Introductions

This article has been Reviewed by the following groups

Read the full article

Abstract

Coronavirus disease 2019 (COVID-19) rapidly spread from a city in China to almost every country in the world, affecting millions of individuals. The rapid increase in the COVID-19 cases in the state of Kerala in India has necessitated the understanding of SARS-CoV-2 genetic epidemiology. We sequenced 200 samples from patients in Kerala using COVIDSeq protocol amplicon-based sequencing. The analysis identified 166 high-quality single-nucleotide variants encompassing four novel variants and 89 new variants in the Indian isolated SARS-CoV-2. Phylogenetic and haplotype analysis revealed that the virus was dominated by three distinct introductions followed by local spread suggesting recent outbreaks and that it belongs to the A2a clade. Further analysis of the functional variants revealed that two variants in the S gene associated with increased infectivity and five variants mapped in primer binding sites affect the efficacy of RT-PCR. To the best of our knowledge, this is the first and most comprehensive report of SARS-CoV-2 genetic epidemiology from Kerala.

Article activity feed

  1. SciScore for 10.1101/2020.09.09.289892: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The base calls generated in the binary base call (BCL) format were demultiplexed to FASTQ reads using bcl2fastq (v2.20).
    bcl2fastq
    suggested: (bcl2fastq , RRID:SCR_015058)
    As per the protocol, the quality control of FASTQ reads was performed using Trimmomatic (v0.39) at a Phred score of Q30 [19] with adapter trimming.
    Trimmomatic
    suggested: (Trimmomatic, RRID:SCR_011848)
    The samples with coverage >99% and <5% unassigned nucleotides underwent variant calling and consensus sequences generation using VarScan (v2.4.4) [22] and SaMtools (v1.10) [21], bcftools (v1.10.2), and seqtk (v 1.3-r114) [23] respectively.
    VarScan
    suggested: (VARSCAN, RRID:SCR_006849)
    SaMtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    Variant Annotation and Comparison with Existing Datasets: Variants were annotated using ANNOVAR [24] employing a range of custom annotation datasets and tables.
    ANNOVAR
    suggested: (ANNOVAR, RRID:SCR_012821)
    Haplotype Analysis: For haplotype analysis, the genomes were aligned to the Wuhan-Hu-1 (NC_045512.2) reference genome using MAFFT [29] and problematic genomic loci (low coverage, high sequencing error rate, hypermutable and homoplasic sites) were masked from the alignment [30].
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    The aligned sequences were imported into the DNA Sequence Polymorphism tool (DnaSP v6.12.03) [31] to generate haplotypes.
    DnaSP
    suggested: (DnaSP, RRID:SCR_003067)
    Times to the most recent common ancestor (tMRCA) for the haplogroups were computed following the Bayesian Markov chain Monte Carlo (MCMC) method using BEAST v1.10.4 [34].
    BEAST
    suggested: (BEAST, RRID:SCR_010228)
    The output was analyzed in Tracer v1.7.1 [35] and burn-in was adjusted to attain an appropriate effective sample size (ESS).
    Tracer
    suggested: (Tracer, RRID:SCR_019121)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The study has two caveats; first is that the samples were collected from a single major tertiary care center in North Kerala. However, the center caters to a large population and region and has close proximity to an international airport. Secondly, the sampling was limited to a short period of time, thus enabling only a cross-sectional view of the epidemic and precluding an accurate and temporal view of the dynamics of the epidemic in the state. Nevertheless, this provides a unique opportunity to create a snapshot of the epidemic in time and space. Notwithstanding the limitations, this is the first and most comprehensive overview of the genetic epidemiology of SARS-CoV-2 in the state of Kerala. While providing insights into the epidemiology of the epidemic, the study also enabled tracing outbreaks thereby highlighting the utility of genome sequencing as an adjunct to high-throughput screening and testing. It has not escaped our mind that scalable technologies that can combine both the approaches [7] could potentially find a place in understanding epidemics better.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.