Modeling Site-Specific Mutation Patterns in Pandemic-Scale Phylogenetics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Models of genome evolution often account for different evolutionary rates at different genome positions due to, e.g., varying selective pressures or mutation rates. Recent evidence from millions of publicly shared SARS-CoV-2 genomes has revealed a more complex mutational landscape than can be modeled with existing approaches. Here, mutation rates are in fact not only highly position-specific, as currently modeled, but also nucleotide-specific; for example, specific mutations can occur very often at certain determined genome positions, while at the same positions other mutations might not be highly recurrent.

Here, we propose and investigate a general model of genome evolution where each genome position is allowed to evolve under an independent, non-normalized substitution rate matrix describing site-specific rates of all mutation types (“Site-Specific Matrix” model, or SSM). We implement SSM in the efficient pandemic-scale phylogenetic inference software CMAPLE.

Large-scale genomic epidemiological simulations suggest that, given enough data, SSM can accurately infer position- and nucleotide-specific substitution rates for more frequently observed nucleotides (typically the reference nucleotide), while other rates require higher levels of divergence. Simulations also show that SSM has a modest impact on the accuracy of phylogenetic tree estimation. We use SSM to analyze the evolution of millions of SARS-CoV-2 genomes and observe substantial mismatches between the substitution rates of classical rate variation models and our SSM estimates. These results suggest that classical models of rate variation are inadequate for modeling site-specific mutation patterns and that SSM is a useful alternative for large-scale genome analyses.

Article activity feed