A Simulation-Based Slope Metric for Anchor List Reliability in Word Embedding Spaces
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Inducing semantic relations in word vector spaces and analyzing how other words or entire documents discursively engage these relations is a popular form of cultural analysis. We propose a reliability metric that is easily interpretable and agnostic to the type of relation. The metric, which we call the anchor reliability coefficient (or relco), is found by creating a synthetic document-term matrix of simulated documents that sequentially shift more of their probability mass from relation-relevant anchor terms to randomly drawn words, and then regressing the documents' similarity to an induced relation by the inverse randomness rank of the documents. We validate the metric at the word-level with both expert- and crowd-sourced dictionaries and at the document-level with expert-annotated social media posts.