Global Spread of SARS-CoV-2 Subtype with Spike Protein Mutation D614G is Shaped by Human Genomic Variations that Regulate Expression of TMPRSS2 and MX1 Genes

This article has been Reviewed by the following groups

Read the full article

Abstract

COVID-19 pandemic is a major human tragedy. Worldwide, SARS-CoV-2 has already infected over 3 million and has killed about 230,000 people. SARS-CoV-2 originated in China and, within three months, has evolved to an additional 10 subtypes. One particular subtype with a non-silent (Aspartate to Glycine) mutation at 614 th position of the Spike protein (D614G) rapidly outcompeted other pre-existing subtypes, including the ancestral. We assessed that D614G mutation generates an additional serine protease (Elastase) cleavage site near the S1-S2 junction of the Spike protein. We also identified that a single nucleotide deletion (delC) at a known variant site (rs35074065) in a cis-eQTL of TMPRSS2 , is extremely rare in East Asians but is common in Europeans and North Americans. The delC allele facilitates entry of the 614G subtype into host cells, thus accelerating the spread of 614G subtype in Europe and North America where the delC allele is common. The delC allele at the cis-eQTL locus rs35074065 of TMPRSS2 leads to overexpression of both TMPRSS2 and a nearby gene MX1 . The cis-eQTL site, rs35074065 overlaps with a transcription factor binding site of an activator ( IRF1 ) and a repressor (IRF2). IRF1 activator can bind to variant delC allele, but IRF2 repressor fails to bind. Thus, in an individual carrying the delC allele, there is only activation, but no repression. On viral entry, IRF1 mediated upregulation of MX1 leads to neutrophil infiltration and processing of 614G mutated Spike protein by neutrophil Elastase. The simultaneous processing of 614G spike protein by TMPRSS2 and Elastase serine proteases facilitates the entry of the 614G subtype into host cells. Thus, SARS-CoV-2, particularly the 614G subtype, has spread more easily and with higher frequency to Europe and North America where the delC allele regulating expression of TMPRSS2 and MX1 host proteins is common, but not to East Asia where this allele is rare.

Article activity feed

  1. SciScore for 10.1101/2020.05.04.075911: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    These 6424 sequences were aligned using MAFFT (33).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    These estimates were further refined using RAxML.
    RAxML
    suggested: (RAxML, RRID:SCR_006086)
    Estimation of the number of segregating sites and values of Tajima’s D (35) for various clades were obtained using MEGA (36) and cross-validated with DNASP (37).
    MEGA
    suggested: (Mega BLAST, RRID:SCR_011920)
    Genotype data, for all significant eQTLs as well as the data on all variants on the TMPRSS2 gene, were extracted from 1000 Genomes dataset [https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/] using tabix (43) with the hg19 chromosomal coordinates as reference.
    1000 Genomes
    suggested: (1000 Genomes Project and AWS, RRID:SCR_008801)
    Functional annotation of each identified variant was done using Annovar (44).
    Annovar
    suggested: (ANNOVAR, RRID:SCR_012821)
    For all these eQTLs and non-silent variants, we calculated the genetic distance, Fst (45), between pairs of major continental populations and regional subpopulations, using PLINK v1.9 (46) (https://www.cog-genomics.org/plink/).
    PLINK
    suggested: (PLINK, RRID:SCR_001757)
    At each site, for the two sequences containing the reference and the variant alleles, we used JASPAR (47) (http://jaspar.genereg.net/) to predict TF recognition sites with the default (80%) relative profile threshold.
    http://jaspar.genereg.net/
    suggested: (JASPAR, RRID:SCR_003030)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on page 27. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.