Evidence for strong mutation bias towards, and selection against, T/U content in SARS-CoV2: implications for attenuated vaccine design

Abstract

Large-scale re-engineering of synonymous sites is a promising strategy to generate attenuated viruses for vaccines. Attenuation typically relies on de-optimisation of codon pairs and maximization of CpG dinculeotide frequencies. So as to formulate evolutionarily-informed attenuation strategies, that aim to force nucleotide usage against the estimated direction favoured by selection, here we examine available whole-genome sequences of SARS-CoV2 to infer patterns of mutation and selection on synonymous sites. Analysis of mutational profiles indicates a strong mutation bias towards T with concomitant selection against T. Accounting for dinucleotide effects reinforces this conclusion, observed TT content being a quarter of that expected under neutrality. A significantly different mutational profile at CDS sites that are not 4-fold degenerate is consistent with contemporaneous selection against T mutations more widely. Although selection against CpG dinucleotides is expected to drive synonymous site G+C content below mutational equilibrium, observed G+C content is slightly above equilibrium, possibly because of selection for higher expression. Consistent with gene-specific selection against CpG dinucleotides, we observe systematic differences of CpG content between SARS-CoV2 genes. We propose an evolutionarily informed gene-bespoke approach to attenuation that, unusually, seeks to increase usage of the already most common synonymous codons. Comparable analysis of H1N1 and Ebola finds that GC3 deviated from neutral equilibrium is not a universal feature, cautioning against generalization of results.

SciScore for 10.1101/2020.05.11.088112: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
CDSs were then translated using BioPython, re-aligned using MAFFT, and then reversed translated using TranslatorX (	BioPython suggested: (Biopython, RRID:SCR_007173) TranslatorX suggested: (TranslatorX, RRID:SCR_014733)
Additionally, H1N1 influenza A pdm09 sequences for strains collected between January 2009 and August 2010 that contained segments PB2, PB1, PA, HA, NP, NA, MP and NS were obtained from GISAID (Shu and McCauley 2017) for 4 segments: RNA polymerase subunit (PB2), hemagglutinin (HA), nucleoprotein (NP), and neuraminidase (NA).	August suggested: None
Remaining sequences were …

SciScore for 10.1101/2020.05.11.088112: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
CDSs were then translated using BioPython, re-aligned using MAFFT, and then reversed translated using TranslatorX (	BioPython suggested: (Biopython, RRID:SCR_007173) TranslatorX suggested: (TranslatorX, RRID:SCR_014733)
Additionally, H1N1 influenza A pdm09 sequences for strains collected between January 2009 and August 2010 that contained segments PB2, PB1, PA, HA, NP, NA, MP and NS were obtained from GISAID (Shu and McCauley 2017) for 4 segments: RNA polymerase subunit (PB2), hemagglutinin (HA), nucleoprotein (NP), and neuraminidase (NA).	August suggested: None
Remaining sequences were translated aligned to the reference strain using MAFFT, and reverse translated to nucleotides using TranslatorX (Abascal, et al. 2010).	MAFFT suggested: (MAFFT, RRID:SCR_011811)
Subsequently, the MSA and the resulting tree were used to identify recurrent mutations (homoplasies) using HomoplasyFinder (Crispell, et al. 2019).	HomoplasyFinder suggested: (HomoplasyFinder, RRID:SCR_017300)
These were solved in NumPy.	NumPy suggested: (NumPy, RRID:SCR_008633)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Evidence for strong mutation bias towards, and selection against, T/U content in SARS-CoV2: implications for attenuated vaccine design

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Molecular Evolution of the <i>Fusion</i> (<i>F</i>) Genes in Human Metapneumovirus Genotype B

Genetic Modification of the BCG Vaccine to Overcome Its Limited Efficacy in Adults: A Specialized Review

The heterogeneous selection landscape of genome evolution in prokaryotes

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Molecular Evolution of the <i>Fusion</i> (<i>F</i>) Genes in Human Metapneumovirus Genotype B

Genetic Modification of the BCG Vaccine to Overcome Its Limited Efficacy in Adults: A Specialized Review

The heterogeneous selection landscape of genome evolution in prokaryotes