Evidence for strong mutation bias towards, and selection against, T/U content in SARS-CoV2: implications for attenuated vaccine design
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Large-scale re-engineering of synonymous sites is a promising strategy to generate attenuated viruses for vaccines. Attenuation typically relies on de-optimisation of codon pairs and maximization of CpG dinculeotide frequencies. So as to formulate evolutionarily-informed attenuation strategies, that aim to force nucleotide usage against the estimated direction favoured by selection, here we examine available whole-genome sequences of SARS-CoV2 to infer patterns of mutation and selection on synonymous sites. Analysis of mutational profiles indicates a strong mutation bias towards T with concomitant selection against T. Accounting for dinucleotide effects reinforces this conclusion, observed TT content being a quarter of that expected under neutrality. A significantly different mutational profile at CDS sites that are not 4-fold degenerate is consistent with contemporaneous selection against T mutations more widely. Although selection against CpG dinucleotides is expected to drive synonymous site G+C content below mutational equilibrium, observed G+C content is slightly above equilibrium, possibly because of selection for higher expression. Consistent with gene-specific selection against CpG dinucleotides, we observe systematic differences of CpG content between SARS-CoV2 genes. We propose an evolutionarily informed gene-bespoke approach to attenuation that, unusually, seeks to increase usage of the already most common synonymous codons. Comparable analysis of H1N1 and Ebola finds that GC3 deviated from neutral equilibrium is not a universal feature, cautioning against generalization of results.
Article activity feed
-
SciScore for 10.1101/2020.05.11.088112: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources CDSs were then translated using BioPython, re-aligned using MAFFT, and then reversed translated using TranslatorX ( BioPythonsuggested: (Biopython, RRID:SCR_007173)TranslatorXsuggested: (TranslatorX, RRID:SCR_014733)Additionally, H1N1 influenza A pdm09 sequences for strains collected between January 2009 and August 2010 that contained segments PB2, PB1, PA, HA, NP, NA, MP and NS were obtained from GISAID (Shu and McCauley 2017) for 4 segments: RNA polymerase subunit (PB2), hemagglutinin (HA), nucleoprotein (NP), and neuraminidase (NA). Augustsuggested: NoneRemaining sequences were … SciScore for 10.1101/2020.05.11.088112: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources CDSs were then translated using BioPython, re-aligned using MAFFT, and then reversed translated using TranslatorX ( BioPythonsuggested: (Biopython, RRID:SCR_007173)TranslatorXsuggested: (TranslatorX, RRID:SCR_014733)Additionally, H1N1 influenza A pdm09 sequences for strains collected between January 2009 and August 2010 that contained segments PB2, PB1, PA, HA, NP, NA, MP and NS were obtained from GISAID (Shu and McCauley 2017) for 4 segments: RNA polymerase subunit (PB2), hemagglutinin (HA), nucleoprotein (NP), and neuraminidase (NA). Augustsuggested: NoneRemaining sequences were translated aligned to the reference strain using MAFFT, and reverse translated to nucleotides using TranslatorX (Abascal, et al. 2010). MAFFTsuggested: (MAFFT, RRID:SCR_011811)Subsequently, the MSA and the resulting tree were used to identify recurrent mutations (homoplasies) using HomoplasyFinder (Crispell, et al. 2019). HomoplasyFindersuggested: (HomoplasyFinder, RRID:SCR_017300)These were solved in NumPy. NumPysuggested: (NumPy, RRID:SCR_008633)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-