Tracing two causative SNPs reveals SARS-CoV-2 transmission in North America population

This article has been Reviewed by the following groups

Read the full article

Abstract

During the COVID-19 pandemic, precisely tracing the route of the SARS-CoV-2 transmission in human population remains challenging. Because this RNA virus can mutate massively without a specifically tracing maker. Herein, using a geographic stratified genome-wide association study (GWAS) of 2599 full-genome sequences, we identified that two SNPs (i.e., 1059.C>T and 25563.G>T) of linkage disequilibrium were presented in approximately half of North America SARS-CoV-2 population (p = 2.44 x 10 −212 and p = 2.98 x 10 −261 ), resulting two missense mutations (i.e., Thr 265 Ile and Gln 57 His) in ORF1ab and ORF3a, respectively. Interestingly, these two SNPs exclusively occurred in the North America dominated clade 1, accumulated during mid to late March, 2020. We did not find any of these two SNPs by retrospectively tracing the two SNPs in bat and pangolin related SARS-CoV-2 and human SARS-CoV-2 from the first epicenter Wuhan or other regions of China mainland. This suggested that the SARS-CoV-2 population of Chinese mainland were different from the prevalent strains of North America. Time-dependently, we found that these two SNPs first occurred in Europe SARS-CoV-2 (26-Feb-2020) which was 3 days early than the occurring date of North America isolates and 17 days early for Asia isolates (Taiwan China dominated). Collectively, this population genetic analysis highlights a well-confidential transmission route of the North America isolates and the two SNPs we newly identified are possibly novel diagnosable or druggable targets for surveillance and treatment.

Article activity feed

  1. SciScore for 10.1101/2020.05.12.092056: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The identically redundant sequences was further removed by CD-HIT software (version 4.8.1, parameters: -aL 1 -aS 1 -c 1 -s 1) (10).
    CD-HIT
    suggested: (CD-HIT, RRID:SCR_007105)
    Phylogenetic analysis: The 2599 full-genome sequences were aligned by MAFFT software (version 7.407, parameter: --auto)(11).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    2.3 SNP calling: The single nucleotide polymorphisms (SNPs) and small insertion-deletion (INDELs) polymorphisms were detected by MUMmer software (version 3.0, nucmer, show-snps) (12) using the Wuhan-Hu-1 strain (GISAID: EPI_ISL_402125, Genbank: NC_045512.2) as a reference genome.
    Genbank
    suggested: (GenBank, RRID:SCR_002760)
    To validate identity of the above polymorphisms, raw reads (40 out of 2599 strains, NCBI SRA database) were analyzed by bwa (version 0.7.16a) (13) and mpileup program of samtools software (version 1.10)(14).
    NCBI SRA
    suggested: None
    samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    2.4 Genome-wide association study (GWAS) and Linkage disequilibrium(LD) analysis: In order to identify causative SNPs in population of North America SARS-CoV-2 (cases=1063, controls=1536), a geographic stratified genome-wide association study against 5312 mutations was performed using PLINK software (version 1.90) (15).
    PLINK
    suggested: (PLINK, RRID:SCR_001757)
    Consequently, 21 significant SNPs were detected (Table 1) and the LD of paring SNPs were estimated and visualized by Haploview software (version 4.1) (17). 2.5.
    Haploview
    suggested: (Haploview, RRID:SCR_003076)
    These analyses were performed by Microsoft® Excel 2016 (Table S2). 2.6. Statistics: Data from 2.5 were plotted by Graphpad (Version 8.2.1 for Windows, San Diego, CA).
    Microsoft® Excel
    suggested: (Microsoft Excel, RRID:SCR_016137)
    Graphpad
    suggested: (GraphPad, RRID:SCR_000306)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.