How many characters are needed to reconstruct a phylogeny?
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Despite increased recent attention towards Bayesian phylogenetics and its applications in understanding macroevolutionary processes, it remains unclear how many discrete characters are needed to accurately estimate tree topologies in a Bayesian framework. This could be particularly relevant for morphological datasets used in phylogenetics, as they usually consist of few dozens to few hundreds of characters—orders of magnitude smaller than most molecular datasets.
I designed a simulation study in the software RevBayes to explore how the number of sampled discrete characters affects accuracy and precision of Bayesian phylogenetic estimates, under various setups differing in number of taxa, average number of state changes per character (i.e., tree length), and number of states per character. Results indicate that between 100 and 500 variable characters are necessary to reach sufficient accuracy and precision of phylogenetic estimates for as low as 20 tips. All other parameters being equal, multistate characters produce slightly more accurate estimates than binary characters, and more labile characters produce more accurate estimates for trees above 50 tips. The results of this study highlight the continuous need for global research efforts geared towards the characterization and digitization of interspecific morphological diversity in both extant and extinct taxa.