Gold student meets star model: Predicting the interpretational diversity of novel compounds in an exploratory-confirmatory approach

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Almost all linguistic expressions are ambiguous to some extent, and can be interpreted in various different ways. This is especially the case for novel expressions a speaker has never encountered before, in particular combined concepts expressed via compounds such as /gold student/ or /monkey ring/. Although previous studies have shown that word embeddings (meaning representations derived from text-based language models), can encode the interpretational diversity of such expressions, these previous studies have been limited to a small, rigid and high-level closed set of relational interpretations (e.g., `student MADE OF gold', `student ABOUT gold'). In contrast, the present study uses more ecologically-valid open-format interpretations provided by human participants, which are afterwards classified in a bottom-up manner in order to compute quantitative estimates of interpretational diversity. In an exploratory study on pre-existing data, we first investigate what measures derived from word embeddings capture interpretational diversity, with the vector norm of the embeddings emerging as the best predictor. In a subsequent high-powered confirmatory study, we then systematically select new items for maximal variation of this vector norm, and replicate the same pattern. This is the first study to show that text-based language models encode the unconstrained interpretational diversity of linguistic expressions, even within a single vector representation, and even for novel expressions that have never been observed in their training data.

Article activity feed