Global genetic diversity patterns and transmissions of SARS-CoV-2

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Background

Since it was firstly discovered in China, the SARS-CoV-2 epidemic has caused a substantial health emergency and economic stress in the world. However, the global genetic diversity and transmissions are still unclear.

Methods

3050 SARS-CoV-2 genome sequences were retrieved from GIASID database. After aligned by MAFFT, the mutation patterns were identified by phylogenetic tree analysis.

Results

We detected 17 high frequency (>6%) mutations in the 3050 sequences. Based on these mutations, we classed the SARS-CoV-2 into four main groups and 10 subgroups. We found that group A was mainly presented in Asia, group B was primarily detected in North America, group C was prevailingly appeared in Asia and Oceania and group D was principally emerged in Europe and Africa. Additionally, the distribution of these groups was different in age, but was similar in gender. Group A, group B1 and group C2 were declined over time, inversely, group B2, group C3 and group D were rising. At last, we found two apparent expansion stages (late Jan-2020 and late Feb-2020 to early Mar-2020, respectively). Notably, most of groups are quickly expanding, especially group D.

Conclusions

We classed the SARS-CoV-2 into four main groups and 10 subgroups based on different mutation patterns at first time. The distribution of the 10 subgroups was different in geography, time and age, but not in gender. Most of groups are rapidly expanding, especially group D. Therefore, we should attach importance to these genetic diversity patterns of SARS-CoV-2 and take more targeted measures to constrain its spread.

Article activity feed

  1. SciScore for 10.1101/2020.05.05.20091413: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The complete genome sequences were aligned with the reference genome of SARS-CoV-2 (NC_045512) by MAFFT (version 7).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    The aligned genomes were then edited manually according to the reference sequence by BioEdit software.
    BioEdit
    suggested: (BioEdit, RRID:SCR_007361)
    For ease to visualize the mutations, heatmap was drew by pheatmap package in R language.
    pheatmap
    suggested: (pheatmap, RRID:SCR_016418)
    Average linkage (UPGMA) method with bootstrap value of 100 replicates was used for the construction of phylogenetic tree using MEGA X software.
    MEGA X
    suggested: None

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our study has some limitations due to the nature of the SARS-CoV-2 genome data. First, the sample collection dates may not reflect the actual infection date and not all infected countries uploaded the sequences timely to the GISAID database, so the mutation patterns transmission analysis is only approximate. Second, because some countries have not sequenced enough virus samples (such as Africa) or some countries uploaded samples from single-source (such as the Diamond Princess cruise ship in Japan), the mutation pattern may be biased in specific country or continent. Nevertheless, our study had been included the most complete available SARS-CoV-2 sequences to date. Whether the different mutation patterns of SARS-CoV-2 will result in biological and clinical differences remains to be determined. In conclusion, we classed the SARS-CoV-2 into four main groups and 10 subgroups based on different mutation patterns for the first time. The distribution of the 10 subgroups was varied in geography, time and age, but not in gender. After two apparent expansion stages, most of groups are rapidly expanding, especially group D. Therefore, we should pay attention to these genetic diversity patterns of SARS-CoV-2 and take more targeted measures to control its spread.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.