Viral population genomics reveals host and infectivity impact on SARS-CoV-2 adaptive landscape
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
Public health surveillance, drug treatment development, and optimization of immunological interventions all depend on understanding pathogen adaptation, which differ for specific pathogens. SARS-CoV-2 is an exceptionally successful human pathogen, yet complete understanding of the forces driving its evolution is lacking. Here, we leveraged almost four million SARS-CoV-2 sequences originating mostly from non-vaccinated naïve patients to investigate the impact of functional constraints and natural immune pressures on the sequence diversity of the SARS-CoV-2 genome. Overall, we showed that the SARS-CoV-2 genome is under strong and intensifying levels of purifying selection with a minority of sites under diversifying pressure. With a particular focus on the spike protein, we showed that sites under selection were critical for protein stability and virus fitness related to increased infectivity and/or reduced neutralization by convalescent sera. We investigated the genetic diversity of SARS-CoV-2 B and T cell epitopes and determined that the currently known T cell epitope sequences were highly conserved. Outside of the spike protein, we observed that mutations under selection in variants of concern can be associated to beneficial outcomes for the virus. Altogether, the results yielded a comprehensive map of all sites under selection across the entirety of SARS-CoV-2 genome, highlighting targets for future studies to better understand the virus spread, evolution and success.
Article activity feed
-
SciScore for 10.1101/2021.12.30.474516: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics Field Sample Permit: For each time-point (2020-03-31, 2020-06-30, 2020-09-30, 2020-12-31, 2021-03-31, 2021-06-30, 2021-09-26 the sample collection date was used to generate a subset of sequences that were collected up to that date. Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
Software and Algorithms Sentences Resources GISAID genomes were filtered to remove poor quality sequences using the following criteria: complete metadata, less than 2% ambiguous DNA sequence (N) or protein (X) with no runs of 4 or more Ns in the genome or Xs in the proteins, all genes (ORFs) in the genome are matched by a BLAST search … SciScore for 10.1101/2021.12.30.474516: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics Field Sample Permit: For each time-point (2020-03-31, 2020-06-30, 2020-09-30, 2020-12-31, 2021-03-31, 2021-06-30, 2021-09-26 the sample collection date was used to generate a subset of sequences that were collected up to that date. Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
Software and Algorithms Sentences Resources GISAID genomes were filtered to remove poor quality sequences using the following criteria: complete metadata, less than 2% ambiguous DNA sequence (N) or protein (X) with no runs of 4 or more Ns in the genome or Xs in the proteins, all genes (ORFs) in the genome are matched by a BLAST search and were not truncated by premature stop codons. BLASTsuggested: (BLASTX, RRID:SCR_001653)To obtain full length coding genomes, coding sequences of each open reading frame were aligned with 7911791 The genome sequences that passed QC were also filtered to remove accessions that are not represented in the September 26801 downloaded from GISAID, 427,773 sequences passed all of the QC filters and were also in the Audacity tree. Audacitysuggested: NoneImages of annotated phylogenetic trees were rendered using the Python API to iTOL, the interactive Tree of Life84. Pythonsuggested: (IPython, RRID:SCR_001658)These frequencies were used to annotate the B-factor of PDB file 6VSB85 with PDB ID 6M0J86 with PDB ID 6M0J86 superimposed on one protomer of 6VSB in order to provide resolution on residues in VUS which lacked resolution in 6VSB; The resulting chimeric structure was used to color each atom using PyMOL’s spectrum command. PyMOL’ssuggested: NoneResults from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- No funding statement was detected.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-
