A Simulation-Based Slope Metric for Anchor List Reliability in Word Embedding Spaces

Marshall A. Taylor
Dustin S. Stoltz
Heather Harper
Sanuj Kumar
Sumanth Reddy Nandhikonda
Luke Burks

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Inducing semantic relations in word vector spaces and analyzing how other words or entire documents discursively engage these relations is a popular form of cultural analysis. We propose a reliability metric that is easily interpretable and agnostic to the type of relation. The metric, which we call the anchor reliability coefficient (or relco), is found by creating an artificial document-term matrix of simulated documents that sequentially shift more of their probability mass from relation-relevant anchor terms to non-anchor words, and then regressing the documents' similarity to an induced relation on the anchor inclusion score of the documents. We validate the metric at the word-level with both expert- and crowd-sourced dictionaries and at the document-level with expert-annotated social media posts. We also provide some heuristic baselines for assessing reliability effect sizes and null hypothesis testing.

Version published to 10.31235/osf.io/sc2ub_v4 on OSF Preprints
Feb 24, 2026
Version published to 10.31235/osf.io/sc2ub_v3 on OSF Preprints
Dec 9, 2025
Version published to 10.31235/osf.io/sc2ub_v1 on OSF Preprints
Jun 27, 2025
Version published to 10.31235/osf.io/sc2ub_v2 on OSF Preprints
Jun 27, 2025

Centroid analysis: Inferring concept representations from open-ended word responses

This article has 2 authors:
1. Aliona Petrenco
2. Fritz Guenther
This article has no evaluationsLatest version Apr 1, 2026
Do Transformers Always Win? An Empirical Study of Semantic Embeddings for Short-Text E-commerce Reviews

This article has 4 authors:
1. Longying Lai
2. Zhiyuan Cheng
3. Kai Cheng
4. Xiaoxi Qi
This article has no evaluationsLatest version Mar 20, 2026
Measuring Individual Differences in Meaning: The Supervised Semantic Differential

This article has 5 authors:
1. Hubert Plisiecki
2. Paweł Lenartowicz
3. Artur Pokropek
4. Kinga Małyska
5. Maria Flakus
This article has no evaluationsLatest version Mar 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Centroid analysis: Inferring concept representations from open-ended word responses

Do Transformers Always Win? An Empirical Study of Semantic Embeddings for Short-Text E-commerce Reviews

Measuring Individual Differences in Meaning: The Supervised Semantic Differential