Neural Text Embeddings in Psychological Research: A Guide With Examples in R

Louis Teitelbaum
Almog Simchon

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In this guide, we review neural embedding models and compare three methods of quantifying psychological constructs for use with embeddings: Distributed Dictionary Representation (DDR), Contextualized Construct Representation (CCR), and a novel approach: Correlational Anchored Vectors (CAV). We aim to cultivate an intuition for the geometric properties of neural embeddings, and a sensitivity to methodological problems that can arise in their use. We argue that while LLM embeddings have the advantage of contextualization, decontextualized word embeddings may have more ability to generalize across text genres when using cosine or dot product similarity metrics. The three methods of operationalizing psychological constructs in vector space likewise each have their advantages in particular applications. We recommend DDR, which derives a vector representation from a word list, for quantifying abstract constructs relating to the overall feel of a text, especially when the research requires that these constructs generalize across multiple genres of text. We recommend CCR, which derives a representation from a questionnaire, for cases in which texts are relatively similar in content to the embedded questionnaire, such as experiments in which participants are asked to respond to a related prompt. CAV, which derives a representation from labeled examples, requires suitably large and reliable training data.

Version published to 10.31234/osf.io/j9g4a_v1 on OSF Preprints
May 20, 2025

Similarity as Likelihood Ratio: Coupling Representations from Machine Learning (and Other Sources) With Cognitive Models

This article has 1 author:
1. Gregory Edward Cox
This article has no evaluationsLatest version Jun 30, 2025
Latent-1: Building a Universal Vector Space

This article has 1 author:
1. Brant von Goble
This article has no evaluationsLatest version May 28, 2025
A Simulation-Based Slope Metric for Anchor List Reliability in Word Embedding Spaces

This article has 6 authors:
1. Marshall A. Taylor
2. Dustin S. Stoltz
3. Heather Harper
4. Sanuj Kumar
5. Sumanth Reddy Nandhikonda
6. Luke Burks
This article has no evaluationsLatest version Jun 27, 2025

Listed in

Abstract

Article activity feed

Related articles

Similarity as Likelihood Ratio: Coupling Representations from Machine Learning (and Other Sources) With Cognitive Models

Latent-1: Building a Universal Vector Space

A Simulation-Based Slope Metric for Anchor List Reliability in Word Embedding Spaces