Worldwide tracing of mutations and the evolutionary dynamics of SARS-CoV-2

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Understanding the mutational and evolutionary dynamics of SARS-CoV-2 is essential for treating COVID-19 and the development of a vaccine. Here, we analyzed publicly available 15,818 assembled SARS-CoV-2 genome sequences, along with 2,350 raw sequence datasets sampled worldwide. We investigated the distribution of inter-host single nucleotide polymorphisms (inter-host SNPs) and intra-host single nucleotide variations (iSNVs). Mutations have been observed at 35.6% (10,649/29,903) of the bases in the genome. The substitution rate in some protein coding regions is higher than the average in SARS-CoV-2 viruses, and the high substitution rate in some regions might be driven to escape immune recognition by diversifying selection. Both recurrent mutations and human-to-human transmission are mechanisms that generate fitness advantageous mutations. Furthermore, the frequency of three mutations (S protein, F400L; ORF3a protein, T164I; and ORF1a protein, Q6383H) has gradual increased over time on lineages, which provides new clues for the early detection of fitness advantageous mutations. Our study provides theoretical support for vaccine development and the optimization of treatment for COVID-19. We call researchers to submit raw sequence data to public databases.

Article activity feed

  1. SciScore for 10.1101/2020.08.07.242263: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Assembled sequences from Shanghai and Australia were downloaded from the GISAID database (https://www.gisaid.org/), which have corresponding raw sequence data in the SRA database (accession number PRJNA627662 and PRJNA613958), were aligned using MUSCLE version 3.8.3111.
    MUSCLE
    suggested: (MUSCLE, RRID:SCR_011812)
    Maximum likelihood phylogenetic trees were constructed using IQ-TREE version 2.0.520.
    IQ-TREE
    suggested: (IQ-TREE, RRID:SCR_017254)
    iSNVs analysis To obtain high-confidence iSNVs, we first trimmed low-quality bases from the raw reads using Trimmomatic 0.39 with default parameters22.
    Trimmomatic
    suggested: (Trimmomatic, RRID:SCR_011848)
    The passed reads were mapped to the SARS-CoV-2 reference genome using BWA mem version 0.7.17 with default parameters23.
    BWA
    suggested: (BWA, RRID:SCR_010910)
    GATK MarkDuplicates version 4.1.3.0 was used to mark duplicate reads24,25.
    GATK
    suggested: (GATK, RRID:SCR_001876)
    Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10.
    BEAST
    suggested: (BEAST, RRID:SCR_010228)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

  2. SciScore for 10.1101/2020.08.07.242263: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Assembled sequences from Shanghai and Australia were downloaded from the GISAID database (https://www.gisaid.org/), which have corresponding raw sequence data in the SRA database (accession number PRJNA627662 and PRJNA613958), were aligned using MUSCLE version 3.8.3111.
    MUSCLE
    suggested: (MUSCLE, RRID:SCR_011812)
    Maximum likelihood phylogenetic trees were constructed using IQ-TREE version 2.0.520.
    IQ-TREE
    suggested: (IQ-TREE, RRID:SCR_017254)
    iSNVs analysis: To obtain high-confidence iSNVs, we first trimmed low-quality bases from the raw reads using Trimmomatic 0.39 with default parameters22.
    Trimmomatic
    suggested: (Trimmomatic, RRID:SCR_011848)
    The passed reads were mapped to the SARS-CoV-2 reference genome using BWA mem version 0.7.17 with default parameters23.
    BWA
    suggested: (BWA, RRID:SCR_010910)
    GATK MarkDuplicates version 4.1.3.0 was used to mark duplicate reads24,25.
    GATK
    suggested: (GATK, RRID:SCR_001876)
    Substitution rates: Substitution rates for the Shanghai samples were assessed using the Bayesian Markov chain Monte Carlo (MCMC) implemented in BEAST v1.8.415.
    BEAST
    suggested: (BEAST, RRID:SCR_010228)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.