Ordinal random forests in language data analysis

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Ordinal outcome variables are common in different branches of language research. For data analysis, linguists typically convert responses into numbers and then analyze them using averages or (mixed-effects) regression models. As this strategy can yield misleading insights, it is to be welcomed that the methodological literature has started to give due consideration to alternative, more adequate procedures. The current paper joins this discourse by demonstrating a modelling approach based on Random Forests, which have been gaining ground in language research. Due to their ability to uncover complex relationships in the data while actively guarding against overfitting, Random Forests are particularly attractive for exploratory work. The present paper relies on the frequency-adjusted borders ordinal forest (fabOF) framework, which has seen a recent mixed-effects upgrade that handles hierarchically structured datasets in a principled way. To unlock the full interpretative potential of this tool, we have created (previously unavailable) partial dependence routines in R. As we illustrate using a case study on lexical preference ratings in Maltese English, mixed-effects fabOFs can not only exceed ordinal mixed-effects regression models in terms of predictive utility, but also afford nuanced insights into patterns in the data that can be visualized effectively.

Article activity feed