The paradox of SOV: A case for token-based typology

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study addresses a paradox in word order typology. On the one hand, the SOV order has longer dependency distances and therefore higher processing costs compared to verb-medial order. On the other hand, it is the most frequent word order in languages of the world. How come? An analysis of large-scale corpus data in thirty-two languages annotated with Universal Dependencies provides a simple answer: the costly long distances occur more rarely than one would assume because verb-final languages usually have fewer arguments compared to verb-medial languages. A series of Bayesian phylogenetic models shows a negative correlation between the proportion of verb-final clauses in a language and the average number of arguments in a clause, while controlling for argument indexing and high- and low-context culture. A closer examination of argument configurations reveals a positive correlation between proportions of verb-final clauses and proportions of subjectless clauses; as for proportions of objectless clauses, the evidence is less clear. In addition, a quanitative analysis of 150 Universal Dependencies corpora shows that the proportions of verb-final clauses with two overt arguments are low, even in verb-final languages. The study highlights the importance of the token-based, gradient approach to typology, which gives us insights into what kind of structures language users prefer, and what they avoid.

Article activity feed