The discovery of gene mutations making SARS-CoV-2 well adapted for humans: host-genome similarity analysis of 2594 genomes from China, the USA and Europe

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a positive-sense single-stranded virus approximately 30 kb in length, causes the ongoing novel coronavirus disease-2019 (COVID-19). Studies confirmed significant genome differences between SARS-CoV-2 and SARS-CoV, suggesting that the distinctions in pathogenicity might be related to genomic diversity. However, the relationship between genomic differences and SARS-CoV-2 fitness has not been fully explained, especially for open reading frame (ORF)-encoded accessory proteins. RNA viruses have a high mutation rate, but how SARS-CoV-2 mutations accelerate adaptation is not clear. This study shows that the host-genome similarity (HGS) of SARS-CoV-2 is significantly higher than that of SARS-CoV, especially in the ORF6 and ORF8 genes encoding proteins antagonizing innate immunity in vivo . A power law relationship was discovered between the HGS of ORF3b, ORF6, and N and the expression of interferon (IFN)-sensitive response element (ISRE)-containing promoters. This finding implies that high HGS of SARS-CoV-2 genome may further inhibit IFN I synthesis and cause delayed host innate immunity. An ORF1ab mutation, 10818G>T, which occurred in virus populations with high HGS but rarely in low-HGS populations, was identified in 2594 genomes with geolocations of China, the USA and Europe. The 10818G>T caused the amino acid mutation M37F in the transmembrane protein nsp6. The results suggest that the ORF6 and ORF8 genes and the mutation M37F may play important roles in causing COVID-19. The findings demonstrate that HGS analysis is a promising way to identify important genes and mutations in adaptive strains, which may help in searching potential targets for pharmaceutical agents.

Article activity feed

  1. SciScore for 10.1101/2020.09.03.280727: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Viral genome data: By using BLAST ORFfinder[14], 31 ORFs were detected in the RNA genome sequence (29903 nt) of SARS-CoV-2 (GenBank: MN908947.3).
    BLAST
    suggested: (BLASTX, RRID:SCR_001653)
    The SARS-CoV-2 genomes were obtained from the GISAID database[15].
    GISAID
    suggested: (GISAID, RRID:SCR_018279)
    The CDSs of the SARS-CoV-2 genome were identified by using MATLAB (https://www.mathworks.com/help/bioinfo/ref/seqshoworfs.html).
    MATLAB
    suggested: (MATLAB, RRID:SCR_001622)
    Human SARS-CoV genomes were collected from NCBI GenBank[16].
    NCBI GenBank[16
    suggested: None
    To facilitate the comparison of Blast results among different subgenomic groups, the original score is standardized to S’ by Blastn:
    Blastn
    suggested: (BLASTN, RRID:SCR_001598)
    The SARS-CoV genomes can be obtained at NCBI database (https://www.ncbi.nlm.nih.gov/).
    https://www.ncbi.nlm.nih.gov/
    suggested: (GENSAT at NCBI - Gene Expression Nervous System Atlas, RRID:SCR_003923)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on pages 33, 34, 35, 39, 40 and 41. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.


    Results from rtransparent:
    • No conflict of interest statement was detected. If there are no conflicts, we encourage authors to explicit state so.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.