Horse, not zebra: accounting for lineage abundance in maximum likelihood phylogenetics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Maximum likelihood phylogenetic methods are popular approaches for estimating evolutionary histories from genome data. These methods do not make prior assumptions regarding strategies used for deciding which genomes were sequenced. However, in genomic epidemiology the sequencing rate is often agnostic to the specific pathogen strain considered. In this scenario, a pathogen strain prevalence should be reflected in its relative abundance in the genome data. Here, I show that this simple assumption, when appropriate and incorporated within maximum likelihood phylogenetics, greatly improves the accuracy of phylogenetic inference.

I introduce and assess two separate approaches to achieve this. The first approach rescales the likelihood of a phylogenetic tree by the number of distinct binary topologies obtainable by arbitrarily resolving multifurcations in the tree. This approach interprets multifurcations as the result of lack of signal for resolving a bifurcating topology rather than as an instantaneous multifurcating event. The second approach instead includes a tree prior that assumes that genomes are sequenced at a rate proportional to their abundance.

Both approaches favor phylogenetic placement at abundant lineages, and dramatically improve the accuracy of phylogenetic inference in scenarios like SARS-CoV-2 phylogenetics, where large multifurcations are common. This considerable impact is also observed in real pandemic-scale SARS-CoV-2 genome data, where accounting for lineage prevalence reduces phylogenetic uncertainty by around one order of magnitude. Both approaches were implemented in the open source phylogenetic software MAPLE v0.7.5.4 ( https://github.com/NicolaDM/MAPLE ).

Article activity feed