Coronavirus genomes carry the signatures of their habitats

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Coronaviruses such as SARS-CoV-2 regularly infect host tissues that express antiviral proteins (AVPs) in abundance. Understanding how they evolve to adapt or evade host immune responses is important in the effort to control the spread of COVID-19. Two AVPs that may shape viral genomes are the zinc finger antiviral protein (ZAP) and the apolipoprotein B mRNA-editing enzyme-catalytic polypeptide-like 3 protein (APOBEC3). The former binds to CpG dinucleotides to facilitate the degradation of viral transcripts while the latter deaminates C into U residues leading to dysfunctional transcripts. We tested the hypothesis that both APOBEC3 and ZAP may act as primary selective pressures that shape the genome of an infecting coronavirus by considering a comprehensive number of publicly available genomes for seven coronaviruses (SARS-CoV-2, SARS-CoV, MERS, Bovine CoV, Murine MHV, Porcine HEV, and Canine CoV). We show that coronaviruses that regularly infect tissues with abundant AVPs have CpG-deficient and U-rich genomes; whereas viruses that do not infect tissues with abundant AVPs do not share these sequence hallmarks. In SARS-CoV-2, CpG is most deficient in the S protein region to evaded ZAP-mediated antiviral defense during cell entry. Furthermore, over four months of SARS-CoV-2 evolutionary history, we observed a marked increase in C to U substitutions in the 5’ UTR and ORF1ab regions. This suggests that the two regions could be under constant C to U deamination by APOBEC3. The evolutionary pressures exerted by host immune systems onto viral genomes may motivate novel strategies for SARS-CoV-2 vaccine development.

Article activity feed

  1. SciScore for 10.1101/2020.06.13.149591: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    RandomizationNext, among 2666 high sequence quality and complete SARS-CoV-2 genomes from CNCB, we randomly selected one genome from each collection date, inclusively between December 31, 2019 (first isolate) and May 6, 2020 (most recent isolate, database last accessed on May 16, 2020), that have complete records of local region annotations and nucleotide sequences in NCBI.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Experimental Models: Organisms/Strains
    SentencesResources
    A total of 99 variants (or samples) were retrieved across 127 days since SARS-CoV-2 (strain Wuhan-Hu-1, MN908947) was first sequenced.
    Wuhan-Hu-1
    suggested: None
    Software and Algorithms
    SentencesResources
    Specifically, we calculated the proportion of mRNA expression (PME) as:

    PME values were calculated from averaged TPM values in 24 human tissues using all RNA-Seq datasets available in the GTEx Portal (Lonsdale et al. 2013), from averaged FPKM values in 26 cattle tissues using the Bovine Genome Database (Shamimuzzaman et al. 2019), from averaged FPKM values in 33 pig tissues using TISSUE 2.0 integrated datasets (Palasca et al. 2018), from averaged FPKM values in 17 mice tissues using all 741 RNA-Seq datasets in mouse ENCODE consortium (Yue et al. 2014), from averaged FPKM values in 12 mice tissues using 79 RNA-Seq datasets in BioProject PRJNA516470 (Naqvi et al. 2019), and from averaged fluorescence intensity units in 10 dog tissues using all 39 microarray datasets in BioProject PRJNA124245 (Briggs et al. 2011).

    BioProject
    suggested: (NCBI BioProject, RRID:SCR_004801)
    Additionally, the complete genomic sequences of 403 MERS strains, 134 SARS-CoV strains, 20 Bovine CoV strains, 2 Canine CoV strains, 26 Murine HEV strains, and 10 Porcine HEV strains were downloaded from the National Center for Biotechnology Information (NCBI) Nucleotide Database (https://www.ncbi.nlm.nih.gov/).
    suggested: (GENSAT at NCBI - Gene Expression Nervous System Atlas, RRID:SCR_003923)
    To make a fair comparison between strains, the genomes were aligned with MAFFT version 7 (Katoh and Standley 2013), with the slow but accurate G-INS-1 option for 134 SARS-CoV, 20 Bovine CoV, 2 Canine CoV, 26 Murine MHV, and 10 Porcine HEV strains, and with the fast FFT-NS-2 option for large alignments for 2666 SARS-CoV-2 and 403 MERS strains.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.