PAN-INDIA 1000 SARS-CoV-2 RNA Genome Sequencing Reveals Important Insights into the Outbreak

This article has been Reviewed by the following groups

Read the full article

Abstract

The PAN-INDIA 1000 SARS-CoV-2 RNA Genome Sequencing Consortium has achieved its initial goal of completing the sequencing of 1000 SARS-CoV-2 genomes from nasopharyngeal and oropharyngeal swabs collected from individuals testing positive for COVID-19 by Real Time PCR. The samples were collected across 10 states covering different zones within India. Given the importance of this information for public health response initiatives investigating transmission of COVID-19, the sequence data is being released in GISAID database. This information will improve our understanding on how the virus is spreading, ultimately helping to interrupt the transmission chains, prevent new cases of infection, and provide impetus to research on intervention measures. This will also provide us with information on evolution of the virus, genetic predisposition (if any) and adaptation to human hosts.

One thousand and fifty two sequences were used for phylodynamic, temporal and geographic mutation patterns and haplotype network analyses. Initial results indicate that multiple lineages of SARS-CoV-2 are circulating in India, probably introduced by travel from Europe, USA and East Asia. A2a (20A/B/C) was found to be predominant, along with few parental haplotypes 19A/B. In particular, there is a predominance of the D614G mutation, which is found to be emerging in almost all regions of the country. Additionally, mutations in important regions of the viral genome with significant geographical clustering have also been observed. The temporal haplotype diversities landscape in each region appears to be similar pan India, with haplotype diversities peaking between March-May, while by June A2a (20A/B/C) emerged as the predominant one. Within haplotypes, different states appear to have different proportions. Temporal and geographic patterns in the sequences obtained reveal interesting clustering of mutations. Some mutations are present at particularly high frequencies in one state as compared to others. The negative estimate Tajimas D (D = −2.26817) is consistent with the rapid expansion of SARS-CoV-2 population in India. Detailed mutational analysis across India to understand the gradual emergence of mutants at different regions of the country and its possible implication will help in better disease management.

Article activity feed

  1. SciScore for 10.1101/2020.08.03.233718: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    For shotgun RNA sequencing data and captured viral RNA sequencing data, sequencing reads were mapped to reference viral genome sequence and consensus sequence for each sample was built using Dragen RNA pathogen detection software (version 9) in BaseSpace (Illumina Inc, USA).
    BaseSpace
    suggested: (BaseSpace, RRID:SCR_011881)
    1010 sequences that passed QC criteria were finally aligned using MAFFT.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    The country confidence information of MRCAs for each subgroup was extracted and curated from auspice timetree using python scripts.
    python
    suggested: (IPython, RRID:SCR_001658)
    Mutation Analysis: Haplotype network analysis: For haplotype network, 815 sequences were selected out of the 1034 genome sequences that were generated ad a part of the DBT’s PAN-India 1000 SARS-CoV2 RNA genome sequencing consortium.
    Mutation Analysis
    suggested: None
    Then the aligned fasta file was converted to Phylip format using a custom Biopython (22) script for haplotype network reconstruction.
    Biopython
    suggested: (Biopython, RRID:SCR_007173)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.