Intra-genome variability in the dinucleotide composition of SARS-CoV-2

This article has been Reviewed by the following groups

Read the full article

Abstract

CpG dinucleotides are under-represented in the genomes of single-stranded RNA viruses, and SARS-CoV-2 is no exception to this. Artificial modification of CpG frequency is a valid approach for live attenuated vaccine development; if this is to be applied to SARS-CoV-2, we must first understand the role CpG motifs play in regulating SARS-CoV-2 replication. Accordingly, the CpG composition of the SARS-CoV-2 genome was characterised. CpG suppression among coronaviruses does not differ between virus genera but does vary with host species and primary replication site (a proxy for tissue tropism), supporting the hypothesis that viral CpG content may influence cross-species transmission. Although SARS-CoV-2 exhibits overall strong CpG suppression, this varies considerably across the genome, and the Envelope (E) open reading frame (ORF) and ORF10 demonstrate an absence of CpG suppression. Across the Coronaviridae, E genes display remarkably high variation in CpG composition, with those of SARS and SARS-CoV-2 having much higher CpG content than other coronaviruses isolated from humans. This is an ancestrally derived trait reflecting their bat origins. Conservation of CpG motifs in these regions suggests that they have a functionality which over-rides the need to suppress CpG; an observation relevant to future strategies towards a rationally attenuated SARS-CoV-2 vaccine.

Article activity feed

  1. SciScore for 10.1101/2020.05.08.083816: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Sequences were annotated into animal groups and genera based on their description in the NCBI database.
    NCBI
    suggested: (NCBI, RRID:SCR_006472)
    Bats are of the order Chiroptera; multiple avian orders were grouped together (Galliformes, Anseriformes, Passeriformes, Gruiformes, Columbiformes and Pelicaniformes); even toed (Artiodactyla) and odd toed (Perissodactyla) ungulate orders were grouped, with camelids analysed separately due to their association with MERS-CoV (Azhar, et al. 2014); Canidae (canine) and Pantherinae (feline) sequences of the Carnivora order were analysed separately, as canines have previously been suggested as an intermediate host species for SARS-CoV-2 (Xia 2020) and cat infections with SARS-CoV-2 have been reported (Shi, et al. 2020); humans were the only representatives from the Primate order; all remaining Carnivora, with the exception of a single civet sequence, belonged to the Mustelidae (mustelids); rodents belong to the Rodentia order; and swine belong to the Artiodactyla order; whales are also Artodactyla but swine were considered separately due to considerable interest in porcine coronaviruses (Vlasova, et al. 2020).
    SARS-CoV-2
    suggested: (Active Motif Cat# 91351, RRID:AB_2847848)
    Phylogenetic analyses: E ORFs were aligned in MEGA X (Kumar, et al. 2018) using the Clustal method.
    MEGA
    suggested: (Mega BLAST, RRID:SCR_011920)
    Statistical analyses: Comparison to determine whether there was a statistically significant difference across groups was performed using a 1-way ANOVA in GraphPad Prism.
    GraphPad Prism
    suggested: (GraphPad Prism, RRID:SCR_002798)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Another limitation of this analysis is that only sequences of greater than 10% divergence were included. Tissue tropism can be defined by much smaller differences; for example, a deletion in the spike protein of transmissible gastroenteritis virus (a porcine coronavirus) altered the tropism of the virus from enteric to respiratory, while nucleotide identity was preserved at 96% (Cox, et al. 1990; Rasschaert, et al. 1990). Further study on tissue tropisms of coronaviruses, as well as tissue expression profiles and antiviral activities of ZAP are needed to validate these analyses. Loss of CpG motifs during adaptation to the human host has been previously described for influenza A virus (Greenbaum, et al. 2008), highlighting the importance of CpG composition for host adaptation. For SARS-CoV-2, we determined a genomic CpG O:E ratio of 0.408, which is similar to the human genome CpG O:E ratio of 0.2-0.4 (McClelland and Ivarie 1982; Sved and Bird 1990; Tomso and Bell 2003). Mimicry of the CpG composition of the host by ssRNA viruses is considered a mechanism to subvert detection by the innate immune response (Simmonds, et al. 2013; Takata, et al. 2017) and speculatively this may indicate that SARS-CoV-2 was genetically predisposed to make a host switch into humans. Similarly, the genomic CPB score of 0.048 indicates that SARS-CoV-2 uses codon pairs which are preferentially utilised in the human ORFeome, which may mean that the virus was well suited for translational efficiency in ...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.