Temporal evolution and adaptation of SARS-CoV-2 codon usage

Abstract

Background: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) first occurred in Wuhan (China) in December of 2019. Since the outbreak, it has accumulated mutations on its coding sequences to optimize its adaptation to the human host. The identification of its genetic variants has become crucial in tracking and evaluating their spread across the globe. Methods: In this study, we compared 320,338 SARS-CoV-2 genomes isolated from all over the world to the first sequenced genome in Wuhan, China. To this end, we analysed over time the codon usage patterns of SARS-CoV-2 genes encoding for the membrane protein (M), envelope (E), spike surface glycoprotein (S), nucleoprotein (N), RNA-dependent RNA polymerase (RdRp) and ORF1ab. Results: We found that genes coding for the proteins N and S diverged more rapidly since the outbreak by accumulating mutations. Interestingly, all genes show a deoptimization of their codon usage with respect to the human host. Our findings suggest a general evolutionary trend of SARS-CoV-2, which evolves towards a sub-optimal codon usage bias to favour the host survival and its spread. Furthermore, we found that S protein and RdRp are more subject to an increasing purifying pressure over time, which implies that these proteins will reach a lower tendency to accept mutations. In contrast, proteins N and M tend to evolve more under the action of mutational bias, thus exploring a large region of their sequence space. Conclusions: Overall, our study shed more light on the evolution of SARS-CoV-2 genes and their adaptation to humans, helping to foresee their mutation patterns and the emergence of new variants.

Article activity feed

SciScore for 10.1101/2020.05.29.123976: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Experimental Models: Organisms/Strains
Sentences	Resources
We downloaded these PPI from NDEx (https://public.ndexbio.org/network/43803262 − 6d69 − 11ea − bfdc − 0ac135e8bacf).	https://public.ndexbio.org/network/43803262 − 6d69 − 11ea − bfdc − 0ac135e8bacf suggested: None
Software and Algorithms
Sentences	Resources
2.2 Relative Synonymous Codon Usage: RSCU vectors for all the genomes were computed by using an in-house Python script, following the formula: In the RSCUi Xi is the number of occurrences, in a given genome, of codon i, and the sum in the denominator runs over its ni synonymous codons.	Python suggested: (IPython, RRID:SCR_001658)
W…

SciScore for 10.1101/2020.05.29.123976: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Experimental Models: Organisms/Strains
Sentences	Resources
We downloaded these PPI from NDEx (https://public.ndexbio.org/network/43803262 − 6d69 − 11ea − bfdc − 0ac135e8bacf).	https://public.ndexbio.org/network/43803262 − 6d69 − 11ea − bfdc − 0ac135e8bacf suggested: None
Software and Algorithms
Sentences	Resources
2.2 Relative Synonymous Codon Usage: RSCU vectors for all the genomes were computed by using an in-house Python script, following the formula: In the RSCUi Xi is the number of occurrences, in a given genome, of codon i, and the sum in the denominator runs over its ni synonymous codons.	Python suggested: (IPython, RRID:SCR_001658)
We then showed the average values of the distance over time with a heatmap, drawn with MATLAB.	MATLAB suggested: (MATLAB, RRID:SCR_001622)
The protein sequences were aligned using Biopython.	Biopython suggested: (Biopython, RRID:SCR_007173)
To detect communities of PPI, we used the application Molecular Complex Detection (MCODE) [32] in Cytoscape (https://cytoscape.org/).	Cytoscape suggested: (Cytoscape, RRID:SCR_003032) https://cytoscape.org/ suggested: (CluePedia Cytoscape plugin, RRID:SCR_015784)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
No funding statement was detected.
No protocol registration statement was detected.

Read the original source

Claudia Carranza
Lucia Ortiz
Maria Eugenia Castellanos
Ana Silvia Gonzalez-Reiche
Renata Mendizabal-Cabrera
Zain Khalil
Adriana van DeGuchte
Keith Farrugia
Mariana Herrera
Ernesto Mena
Celia Cordon-Rosales
Harm van Bakel
Daniel R. Perez

Reviewed by Access Microbiology

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

Pulchérie Pelembi
Philippe Colson
Alain Farra
Ornella Anne Sibiro-Demi
Christian Noël Malaka
Aurélia Kwasiborski
Véronique Hourdel
Gilles Landry Ngaya
Romaric Nzoumbou-Boko
Jean-Claude Manuguerra
Emmanuel Ryvalin Nakoune-Yandoko
Guy VERNET
Bernard La Scola
Valérie Caro
Alexandre Manirakiza

Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary

Marvin I. De los Santos

Temporal evolution and adaptation of SARS-CoV-2 codon usage

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary