Viral population genomics reveals host and infectivity impact on SARS-CoV-2 adaptive landscape

This article has been Reviewed by the following groups

Read the full article

Abstract

Public health surveillance, drug treatment development, and optimization of immunological interventions all depend on understanding pathogen adaptation, which differ for specific pathogens. SARS-CoV-2 is an exceptionally successful human pathogen, yet complete understanding of the forces driving its evolution is lacking. Here, we leveraged almost four million SARS-CoV-2 sequences originating mostly from non-vaccinated naïve patients to investigate the impact of functional constraints and natural immune pressures on the sequence diversity of the SARS-CoV-2 genome. Overall, we showed that the SARS-CoV-2 genome is under strong and intensifying levels of purifying selection with a minority of sites under diversifying pressure. With a particular focus on the spike protein, we showed that sites under selection were critical for protein stability and virus fitness related to increased infectivity and/or reduced neutralization by convalescent sera. We investigated the genetic diversity of SARS-CoV-2 B and T cell epitopes and determined that the currently known T cell epitope sequences were highly conserved. Outside of the spike protein, we observed that mutations under selection in variants of concern can be associated to beneficial outcomes for the virus. Altogether, the results yielded a comprehensive map of all sites under selection across the entirety of SARS-CoV-2 genome, highlighting targets for future studies to better understand the virus spread, evolution and success.

Article activity feed

  1. SciScore for 10.1101/2021.12.30.474516: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    EthicsField Sample Permit: For each time-point (2020-03-31, 2020-06-30, 2020-09-30, 2020-12-31, 2021-03-31, 2021-06-30, 2021-09-26 the sample collection date was used to generate a subset of sequences that were collected up to that date.
    Sex as a biological variablenot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    GISAID genomes were filtered to remove poor quality sequences using the following criteria: complete metadata, less than 2% ambiguous DNA sequence (N) or protein (X) with no runs of 4 or more Ns in the genome or Xs in the proteins, all genes (ORFs) in the genome are matched by a BLAST search and were not truncated by premature stop codons.
    BLAST
    suggested: (BLASTX, RRID:SCR_001653)
    To obtain full length coding genomes, coding sequences of each open reading frame were aligned with 7911791 The genome sequences that passed QC were also filtered to remove accessions that are not represented in the September 26801 downloaded from GISAID, 427,773 sequences passed all of the QC filters and were also in the Audacity tree.
    Audacity
    suggested: None
    Images of annotated phylogenetic trees were rendered using the Python API to iTOL, the interactive Tree of Life84.
    Python
    suggested: (IPython, RRID:SCR_001658)
    These frequencies were used to annotate the B-factor of PDB file 6VSB85 with PDB ID 6M0J86 with PDB ID 6M0J86 superimposed on one protomer of 6VSB in order to provide resolution on residues in VUS which lacked resolution in 6VSB; The resulting chimeric structure was used to color each atom using PyMOL’s spectrum command.
    PyMOL’s
    suggested: None

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.