Inferring selection effects in SARS-CoV-2 with Bayesian Viral Allele Selection

This article has been Reviewed by the following groups

Read the full article

Abstract

The global effort to sequence millions of SARS-CoV-2 genomes has provided an unprecedented view of viral evolution. Characterizing how selection acts on SARS-CoV-2 is critical to developing effective, long-lasting vaccines and other treatments, but the scale and complexity of genomic surveillance data make rigorous analysis challenging. To meet this challenge, we develop Bayesian Viral Allele Selection (BVAS), a principled and scalable probabilistic method for inferring the genetic determinants of differential viral fitness and the relative growth rates of viral lineages, including newly emergent lineages. After demonstrating the accuracy and efficacy of our method through simulation, we apply BVAS to 6.9 million SARS-CoV-2 genomes. We identify numerous mutations that increase fitness, including previously identified mutations in the SARS-CoV-2 Spike and Nucleocapsid proteins, as well as mutations in non-structural proteins whose contribution to fitness is less well characterized. In addition, we extend our baseline model to identify mutations whose fitness exhibits strong dependence on vaccination status as well as pairwise interaction effects, i.e. epistasis. Strikingly, both these analyses point to the pivotal role played by the N501 residue in the Spike protein. Our method, which couples Bayesian variable selection with a diffusion approximation in allele frequency space, lays a foundation for identifying fitness-associated mutations under the assumption that most alleles are neutral.

Article activity feed

  1. SciScore for 10.1101/2022.05.07.490748: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    We highlight the following limitations.10 Estimating the effective population size v is challenging, especially since v can exhibit significant variability across time. While we have argued that sensitivity to v is fairly moderate, improved v estimates should lead to improved statistical efficiency, especially if v can be estimated with finer spatial and temporal granularity. Doing so would likely require incorporating additional sources of data (e.g. case counts) and represents an important direction for future work. Several of the simplifying assumptions that underly BVAS are expected to be violated at some level in real world data. Notably, BVAS assumes that fitness depends linearly on genotype, Eqn. 2, so that it is unable to account for epistasis, e.g. pairwise interactions between amino acids. Given growing evidence for epistasis in SARS-CoV-2 (Starr et al., 2022), it would be interesting to incorporate epistasis into BVAS, and we believe that our rigorous approach to inducing sparsity could be an ideal starting point for doing so effectively. In practice this would likely require making biologically informed assumptions that reduce the space of selection effects considered, e.g. limiting to pairs of mutations that are near each other in space. Applying BVAS to 6.9 million SARS-CoV-2 genomes provides a detailed picture of viral selection in action. Comparisons to PyR0 and MAP are in broad qualitative agreement, suggesting that all three methods are capable of identifyin...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.