Evidence for strong mutation bias towards, and selection against, T/U content in SARS-CoV2: implications for attenuated vaccine design

This article has been Reviewed by the following groups

Read the full article

Abstract

Large-scale re-engineering of synonymous sites is a promising strategy to generate attenuated viruses for vaccines. Attenuation typically relies on de-optimisation of codon pairs and maximization of CpG dinculeotide frequencies. So as to formulate evolutionarily-informed attenuation strategies, that aim to force nucleotide usage against the estimated direction favoured by selection, here we examine available whole-genome sequences of SARS-CoV2 to infer patterns of mutation and selection on synonymous sites. Analysis of mutational profiles indicates a strong mutation bias towards T with concomitant selection against T. Accounting for dinucleotide effects reinforces this conclusion, observed TT content being a quarter of that expected under neutrality. A significantly different mutational profile at CDS sites that are not 4-fold degenerate is consistent with contemporaneous selection against T mutations more widely. Although selection against CpG dinucleotides is expected to drive synonymous site G+C content below mutational equilibrium, observed G+C content is slightly above equilibrium, possibly because of selection for higher expression. Consistent with gene-specific selection against CpG dinucleotides, we observe systematic differences of CpG content between SARS-CoV2 genes. We propose an evolutionarily informed gene-bespoke approach to attenuation that, unusually, seeks to increase usage of the already most common synonymous codons. Comparable analysis of H1N1 and Ebola finds that GC3 deviated from neutral equilibrium is not a universal feature, cautioning against generalization of results.

Article activity feed

  1. SciScore for 10.1101/2020.05.11.088112: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    CDSs were then translated using BioPython, re-aligned using MAFFT, and then reversed translated using TranslatorX (
    BioPython
    suggested: (Biopython, RRID:SCR_007173)
    TranslatorX
    suggested: (TranslatorX, RRID:SCR_014733)
    Additionally, H1N1 influenza A pdm09 sequences for strains collected between January 2009 and August 2010 that contained segments PB2, PB1, PA, HA, NP, NA, MP and NS were obtained from GISAID (Shu and McCauley 2017) for 4 segments: RNA polymerase subunit (PB2), hemagglutinin (HA), nucleoprotein (NP), and neuraminidase (NA).
    August
    suggested: None
    Remaining sequences were translated aligned to the reference strain using MAFFT, and reverse translated to nucleotides using TranslatorX (Abascal, et al. 2010).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Subsequently, the MSA and the resulting tree were used to identify recurrent mutations (homoplasies) using HomoplasyFinder (Crispell, et al. 2019).
    HomoplasyFinder
    suggested: (HomoplasyFinder, RRID:SCR_017300)
    These were solved in NumPy.
    NumPy
    suggested: (NumPy, RRID:SCR_008633)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.