Following the Trail of One Million Genomes: Footprints of SARS-CoV-2 Adaptation to Humans

This article has been Reviewed by the following groups

Read the full article

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has accumulated genomic mutations at an approximately linear rate since it first infected human populations in late 2019. Controversies remain regarding the identity, proportion, and effects of adaptive mutations as SARS-CoV-2 evolves from a bat-to a human-adapted virus. The potential for vaccine-escape mutations poses additional challenges in pandemic control. Despite being of great interest to therapeutic and vaccine development, human-adaptive mutations in SARS-CoV-2 are masked by a genome-wide linkage disequilibrium under which neutral and even deleterious mutations can reach fixation by chance or through hitchhiking. Furthermore, genome-wide linkage equilibrium imposes clonal interference by which multiple adaptive mutations compete against one another. Informed by insights from microbial experimental evolution, we analyzed close to one million SARS-CoV-2 genomes sequenced during the first year of the COVID-19 pandemic and identified putative human-adaptive mutations according to the rates of synonymous and missense mutations, temporal linkage, and mutation recurrence. Furthermore, we developed a forward-evolution simulator with the realistic SARS-CoV-2 genome structure and base substitution probabilities able to predict viral genome diversity under neutral, background selection, and adaptive evolutionary models. We conclude that adaptive mutations have emerged early, rapidly, and constantly to dominate SARS-CoV-2 populations despite clonal interference and purifying selection. Our analysis underscores a need for genomic surveillance of mutation trajectories at the local level for early detection of adaptive and immune-escape variants. Putative human-adaptive mutations are over-represented in viral proteins interfering host immunity and binding host-cell receptors and thus may serve as priority targets for designing therapeutics and vaccines against human-adapted forms of SARS-CoV-2.

Article activity feed

  1. SciScore for 10.1101/2021.05.07.443114: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Unique haplotypes were obtained with custom Perl scripts based on the BioPerl package (Stajich et al. 2002).
    BioPerl
    suggested: (BioPerl, RRID:SCR_002989)
    A custom Python script sampled viral genomes (e.g., n=100) by month and at three spatial scales (continent, country, and state).
    Python
    suggested: (IPython, RRID:SCR_001658)
    Evolutionary statistics, including variant frequencies, linkage disequilibrium (r2), haplotypes, and base substitution frequencies were generated with programs BCFTools and VCFTools (Danecek et al. 2011).
    VCFTools
    suggested: (VCFtools, RRID:SCR_001235)
    We used Haploview (version 4.2) to calculate LD scores (D’ and r2) as well as their statistical significance between pairs of SNVs (Barrett 2009).
    Haploview
    suggested: (Haploview, RRID:SCR_003076)
    We used the DNAPARS program of the PHYLIP (version 3.696) package to search for a maximum parsimony tree of unique haplotypes, obtaining the homoplasy index (HI) and the number of base substitutions at each SNV site (Felsenstein 1989).
    PHYLIP
    suggested: (PHYLIP, RRID:SCR_006244)
    To ensure that all genomic sites were mutated at least once, we ran CovSimulator ten times such that the chance of a site not undergoing any mutation was small p = 0.51210 = 1.25e-3.
    CovSimulator
    suggested: None
    The R package pheatmap was used to generate heatmaps.
    pheatmap
    suggested: (pheatmap, RRID:SCR_016418)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.