Following the Trail of One Million Genomes: Footprints of SARS-CoV-2 Adaptation to Humans
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has accumulated genomic mutations at an approximately linear rate since it first infected human populations in late 2019. Controversies remain regarding the identity, proportion, and effects of adaptive mutations as SARS-CoV-2 evolves from a bat-to a human-adapted virus. The potential for vaccine-escape mutations poses additional challenges in pandemic control. Despite being of great interest to therapeutic and vaccine development, human-adaptive mutations in SARS-CoV-2 are masked by a genome-wide linkage disequilibrium under which neutral and even deleterious mutations can reach fixation by chance or through hitchhiking. Furthermore, genome-wide linkage equilibrium imposes clonal interference by which multiple adaptive mutations compete against one another. Informed by insights from microbial experimental evolution, we analyzed close to one million SARS-CoV-2 genomes sequenced during the first year of the COVID-19 pandemic and identified putative human-adaptive mutations according to the rates of synonymous and missense mutations, temporal linkage, and mutation recurrence. Furthermore, we developed a forward-evolution simulator with the realistic SARS-CoV-2 genome structure and base substitution probabilities able to predict viral genome diversity under neutral, background selection, and adaptive evolutionary models. We conclude that adaptive mutations have emerged early, rapidly, and constantly to dominate SARS-CoV-2 populations despite clonal interference and purifying selection. Our analysis underscores a need for genomic surveillance of mutation trajectories at the local level for early detection of adaptive and immune-escape variants. Putative human-adaptive mutations are over-represented in viral proteins interfering host immunity and binding host-cell receptors and thus may serve as priority targets for designing therapeutics and vaccines against human-adapted forms of SARS-CoV-2.
Article activity feed
-
SciScore for 10.1101/2021.05.07.443114: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Unique haplotypes were obtained with custom Perl scripts based on the BioPerl package (Stajich et al. 2002). BioPerlsuggested: (BioPerl, RRID:SCR_002989)A custom Python script sampled viral genomes (e.g., n=100) by month and at three spatial scales (continent, country, and state). Pythonsuggested: (IPython, RRID:SCR_001658)Evolutionary statistics, including variant frequencies, linkage disequilibrium (r2), haplotypes, and base substitution frequencies were generated with programs BCFTools and VCFTools (Danecek et al. 2011). VCFToolssuggested: (VCFtools, RRID:SCR_001235)We used Haploview … SciScore for 10.1101/2021.05.07.443114: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Unique haplotypes were obtained with custom Perl scripts based on the BioPerl package (Stajich et al. 2002). BioPerlsuggested: (BioPerl, RRID:SCR_002989)A custom Python script sampled viral genomes (e.g., n=100) by month and at three spatial scales (continent, country, and state). Pythonsuggested: (IPython, RRID:SCR_001658)Evolutionary statistics, including variant frequencies, linkage disequilibrium (r2), haplotypes, and base substitution frequencies were generated with programs BCFTools and VCFTools (Danecek et al. 2011). VCFToolssuggested: (VCFtools, RRID:SCR_001235)We used Haploview (version 4.2) to calculate LD scores (D’ and r2) as well as their statistical significance between pairs of SNVs (Barrett 2009). Haploviewsuggested: (Haploview, RRID:SCR_003076)We used the DNAPARS program of the PHYLIP (version 3.696) package to search for a maximum parsimony tree of unique haplotypes, obtaining the homoplasy index (HI) and the number of base substitutions at each SNV site (Felsenstein 1989). PHYLIPsuggested: (PHYLIP, RRID:SCR_006244)To ensure that all genomic sites were mutated at least once, we ran CovSimulator ten times such that the chance of a site not undergoing any mutation was small p = 0.51210 = 1.25e-3. CovSimulatorsuggested: NoneThe R package pheatmap was used to generate heatmaps. pheatmapsuggested: (pheatmap, RRID:SCR_016418)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-