Likelihood-based evaluation of character recoding schemes for phylogenetic analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Character recoding is a common practice in evolutionary studies. For example, phylogenies can be inferred from protein-coding DNA sequences via 61-state codon substitution models. However, often the inference is instead done by adopting 20-state amino acid replacement models that do not explicitly consider synonymous substitution. When there is substantial heterogeneity of amino acid frequencies among sites and/or among lineages, another sort of character recoding is sometimes performed. In these cases, one option is to reduce the state space of the model by placing each of the 20 amino acids into one of a relatively small number (e.g., 6) of groups of amino acids and then modeling only how group membership changes at a site over evolutionary time. Unfortunately, these kinds of character recoding schemes are prone to reducing the amount of available evolutionary information. Here, we provide a likelihood framework to statistically assess recoding schemes. Although we concentrate on the recoding of 61-state codon substitution models into 20-state amino acid replacement models, the general approach is also relevant to other recoding schemes such as those that recode 20-state models into 6-state models.