The paradox of SOV: A case for token-based typology
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study addresses a paradox in word order typology. On the one hand, the SOV order has longer dependency distances and therefore higher processing costs compared to verb-medial order. On the other hand, it is the most frequent word order in languages of the world. How come? An analysis of large-scale corpus data in thirty-two languages annotated with Universal Dependencies provides a simple answer: the costly long distances occur more rarely than one would assume because verb-final languages usually have fewer arguments compared to verb-medial languages. A series of Bayesian phylogenetic models shows a negative correlation between the proportion of verb-final clauses in a language and the average number of arguments in a clause, while controlling for argument indexing and high- and low-context culture. A closer examination of argument configurations reveals a positive correlation between proportions of verb-final clauses and proportions of subjectless clauses; as for proportions of objectless clauses, the evidence is less clear. In addition, a quanitative analysis of 150 Universal Dependencies corpora shows that the proportions of verb-final clauses with two overt arguments are low, even in verb-final languages. The study highlights the importance of the token-based, gradient approach to typology, which gives us insights into what kind of structures language users prefer, and what they avoid.