Testing the validity and adequacy of linguistic phylogenetic analyses

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Bayesian phylogenetics has become a standard tool in historical linguistics, and for the most part implements models borrowed from evolutionary biology. Not enough work has been done to validate the analysis set-up that has become standardised in phylolinguistics, which consists of binary data with ascertainment bias, data partitions with correlated site number and rate, the binary covarion substitution model, and the uncorrelated lognormal branch rate model. Here I perform a set of simulation-based calibration studies to test a typical phylolinguistic analysis in the software BEAST2. Although the analysis can correctly recover the parameters of the substitution model, complications arise due to the combination of ascertainment bias and partitions of unequal length and rate. Reweighting the partition-specific rates by the number of sites, as is the default behaviour, leads to poorly calibrated posteriors. An alternative approach, where each meaning is assumed to come from an alignment of equal length, behaves correctly in simulations and is found to fit better to empirical data. I also assess the adequacy of the covarion substitution model through posterior predictive simulations. The covarion is found to fall short of approximating the true process of lexical evolution, likely due to the prevalence of semantic shift and the non-independence of cognate substitutions in real data. This work highlights the importance of thorough testing of models and their implementation in phylolinguistics, as well as the need for further research on improving models of lexical evolution.

Article activity feed