Diagnostics of dated phylogenies in microbial population genetics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Microbial population genetic studies often involve the use of a dated phylogeny to show how the genomes are related over a relevant timescale. Many tools have recently been developed to date the nodes of a standard phylogeny, but all make underlying assumptions that may not be realistic for a given dataset, making the results potentially unreliable. Model comparison is sometimes used to remedy this issue, whereby inference under several models is compared to establish which result can be trusted. Although such comparison is clearly useful to assess the relative merits of several inference attempts, here instead we focus on the problem of evaluating how good an inference is in absolute terms, without comparison. We consider several approaches for diagnosing potential issues in a reconstructed dated phylogeny, including outlier detection, posterior predictive checking and residual analysis. These methods are well-established diagnostics tools in other areas of statistics, but here we show how they can be applied to the specific inference of dated phylogenies. We illustrate their use on many simulated datasets, with inference being performed either from the correct model to quantify the specificity or from an incorrect model to quantify the sensitivity of the diagnostics methods. We also applied the methods to three real-life datasets to showcase the range of issues that they can detect. We advocate the use of these diagnostics tools for all microbial population genetic studies that involve the reconstruction of a dated phylogeny.

Article activity feed