Synonymous mutations and the molecular evolution of SARS-CoV-2 origins
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Human severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is most closely related, by average genetic distance, to two coronaviruses isolated from bats, RaTG13 and RmYN02. However, there is a segment of high amino acid similarity between human SARS-CoV-2 and a pangolin-isolated strain, GD410721, in the receptor-binding domain (RBD) of the spike protein, a pattern that can be caused by either recombination or by convergent amino acid evolution driven by natural selection. We perform a detailed analysis of the synonymous divergence, which is less likely to be affected by selection than amino acid divergence, between human SARS-CoV-2 and related strains. We show that the synonymous divergence between the bat-derived viruses and SARS-CoV-2 is larger than between GD410721 and SARS-CoV-2 in the RBD, providing strong additional support for the recombination hypothesis. However, the synonymous divergence between pangolin strain and SARS-CoV-2 is also relatively high, which is not consistent with a recent recombination between them, instead, it suggests a recombination into RaTG13. We also find a 14-fold increase in the dN/dS ratio from the lineage leading to SARS-CoV-2 to the strains of the current pandemic, suggesting that the vast majority of nonsynonymous mutations currently segregating within the human strains have a negative impact on viral fitness. Finally, we estimate that the time to the most recent common ancestor of SARS-CoV-2 and RaTG13 or RmYN02 based on synonymous divergence is 51.71 years (95% CI, 28.11–75.31) and 37.02 years (95% CI, 18.19–55.85), respectively.
Article activity feed
-
SciScore for 10.1101/2020.04.20.052019: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources BLAST searches: Sequences for blast databases were downloaded on March 26, 2020 from the following sources: EMBL nucleotide libraries for virus (ftp://ftp.ebi.ac.uk/pub/databases/embl/release/std), NCBI Virus Genomes BLASTsuggested: (BLASTX, RRID:SCR_001653), NCBI Influenza Genomes (ftp://ftp.ncbi.nlm.nih.gov/genomes/INFLUENZA/), all Whole Genome Shotgun (https://www.ncbi.nlm.nih.gov/genbank/wgs/) assemblies under taxonomy ID 10239, along with GISAID Epiflu and EpiCoV databases. Influenza Genomessuggested: Nonehttps://www.ncbi.nlm.nih.gov/genbank/wgs/suggested: (Whole Genome Shotgun (WGS …SciScore for 10.1101/2020.04.20.052019: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources BLAST searches: Sequences for blast databases were downloaded on March 26, 2020 from the following sources: EMBL nucleotide libraries for virus (ftp://ftp.ebi.ac.uk/pub/databases/embl/release/std), NCBI Virus Genomes BLASTsuggested: (BLASTX, RRID:SCR_001653), NCBI Influenza Genomes (ftp://ftp.ncbi.nlm.nih.gov/genomes/INFLUENZA/), all Whole Genome Shotgun (https://www.ncbi.nlm.nih.gov/genbank/wgs/) assemblies under taxonomy ID 10239, along with GISAID Epiflu and EpiCoV databases. Influenza Genomessuggested: Nonehttps://www.ncbi.nlm.nih.gov/genbank/wgs/suggested: (Whole Genome Shotgun (WGS Project, RRID:SCR_016637)The genome alignments were performed using MAFFT (v7.450) (Katoh and Standley 2013) with parameters “--maxiterate 1000 --localpair”. MAFFTsuggested: (MAFFT, RRID:SCR_011811)The coding sequences of each gene were aligned using PRANK (Loytynoja 2014) (v.170427) with parameters “-codon -F”. PRANKsuggested: (prank, RRID:SCR_017228)The NJ tree was estimated using the ’neighbor’ software from the PHYLIP package (Felsenstein 2009). PHYLIPsuggested: (PHYLIP, RRID:SCR_006244)Estimation of sequence divergence in 300-bp windows: dN and dS were estimated using two different methods implemented in the PAML package (Yang 2007) PAMLsuggested: (PAML, RRID:SCR_014932)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
-
-
-
SciScore for 10.1101/2020.03.02.973255: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources We used TRIMMOMATIC (59) to trim the reads of those samples to 100 bp, with the following command line: We aligned the FASTQ files using Burrows-Wheeler Aligner (BWA) (60) using the official sequence of SARS-CoV-2 (NC_045512. 2) as reference genome. TRIMMOMATICsuggested: (Trimmomatic, RRID:SCR_011848)After the alignments BAM files were sorted them using SAMtools ( SAMtoolssuggested: (SAMTOOLS, RRID:SCR_002105)Due to a high error rate reported by QUALIMAP, samples SRR11059943 and SRR10971381 have been removed from the analysis. QUALIMAPsuggested: (QualiMap, RRID:SCR_001209)To avoid potential … SciScore for 10.1101/2020.03.02.973255: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources We used TRIMMOMATIC (59) to trim the reads of those samples to 100 bp, with the following command line: We aligned the FASTQ files using Burrows-Wheeler Aligner (BWA) (60) using the official sequence of SARS-CoV-2 (NC_045512. 2) as reference genome. TRIMMOMATICsuggested: (Trimmomatic, RRID:SCR_011848)After the alignments BAM files were sorted them using SAMtools ( SAMtoolssuggested: (SAMTOOLS, RRID:SCR_002105)Due to a high error rate reported by QUALIMAP, samples SRR11059943 and SRR10971381 have been removed from the analysis. QUALIMAPsuggested: (QualiMap, RRID:SCR_001209)To avoid potential artifacts due to strand bias, we used the AS_StrandOddsRatio parameter calculated following GATK guidelines ((https://gatk.broadinstitute.org/hc/en-us/articles/360040507111-AS-StrandOddsRatio), and any mutation with a AS_StrandOddsRatio > 4 has been removed from the dataset. GATKsuggested: (GATK, RRID:SCR_001876)Bcftools (61) has been used to calculate total allelic depths on the forward and reverse strand (ADF, ADR) for AS_StrandOddsRatio calculation, with the following command line: Mutations common to the datasets generated by Reditools 2 and JACUSA were considered (n = 910, Fig. Reditoolssuggested: (REDItools, RRID:SCR_012133)Data manipulation: R packages (Biostrings, rsamtools, ggseqlogo ggplot2, splitstackshape) and custom Perl scripts were used to handle the data. ggplot2suggested: (ggplot2, RRID:SCR_014601)SARS-CoV-2, SARS and MERS genomic data were prepared for the Logi alignment using the GenomicRanges R package (63) GenomicRangessuggested: (GenomicRanges, RRID:SCR_000025)Consensus sequences of SARS and MERS genomes were built using the “cons” tool from the EMBOSS suite (http://bioinfo.nhri.org.tw/gui/) with default settings. EMBOSSsuggested: (EMBOSS, RRID:SCR_008493)SARS-CoV-2 genomic sequences were downloaded from GISAID (https://www.gisaid.org/) and aligned with MUSCLE (64). MUSCLEsuggested: (MUSCLE, RRID:SCR_011812)Results from OddPub: Thank you for sharing your data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-