Decomposing the sources of SARS-CoV-2 fitness variation in the United States

This article has been Reviewed by the following groups

Read the full article

Abstract

The fitness of a pathogen is a composite phenotype determined by many different factors influencing growth rates both within and between hosts. Determining what factors shape fitness at the host population-level is especially challenging because both intrinsic factors like pathogen genetics and extrinsic factors such as host behavior influence between-host transmission potential. This challenge has been highlighted by controversy surrounding the population-level fitness effects of mutations in the SARS-CoV-2 genome and their relative importance when compared against non-genetic factors shaping transmission dynamics. Building upon phylodynamic birth–death models, we develop a new framework to learn how hundreds of genetic and non-genetic factors have shaped the fitness of SARS-CoV-2. We estimate the fitness effects of all amino acid variants and several structural variants that have circulated in the United States between February 2020 and March 2021 from viral phylogenies. We also estimate how much fitness variation among pathogen lineages is attributable to genetic versus non-genetic factors such as spatial heterogeneity in transmission rates. Before September 2020, most fitness variation between lineages can be explained by background spatial heterogeneity in transmission rates across geographic regions. Starting in late 2020, genetic variation in fitness increased dramatically with the emergence of several new lineages including B.1.1.7, B.1.427, B.1.429 and B.1.526. Our analysis also indicates that genetic variants in less well-explored genomic regions outside of Spike may be contributing significantly to overall fitness variation in the viral population.

Article activity feed

  1. SciScore for 10.1101/2020.12.14.422739: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Genomes were aligned using MAFFT (Katoh and Standley, 2013).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    A maximum likelihood (ML) phylogenetic tree was reconstructed in RAxML (Stamatakis, 2014) using the rapid bootstrapping method with 10 bootstrap replicates assuming a GTR model of sequence evolution with Gamma-distributed rate variation among sites.
    RAxML
    suggested: (RAxML, RRID:SCR_006086)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    While our phylodynamic inference framework accounts for many potentially confounding factors including background fitness variation, our analysis still has a number of limitations. First, inferences of pathogen fitness from phylogenies will inevitably depend on what lineages are sampled and included in the phylogeny. Although we did not try to directly correct for sampling biases in the GISAID database, we included spatiotemporal effects in our model in order to account for differences in either background transmission rates or sampling fractions over time. Moreover, estimated transmission rates did not significantly vary depending on whether we assume a constant sampling fraction or try to explicitly model how sampling fractions vary over space and time. Second, we chose a simple fitness mapping function that assumes each feature has a multiplicative effect on lineage fitness (such that log fitness is an additive linear function of features). In reality, the relationship between a pathogen’s genotype, environment and other features may be considerably more complex due to nonlinear relationships between features and fitness or interactions among genetic features (epistasis) and the environment (GxE interactions). Learning what types of functions are expressive enough to capture these these complexities while remaining statistically tractable and biologically interpretable is a major challenge for future work. Finally, the computational efficiency of our approach relies on fir...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.