Neural Text Embeddings in Psychological Research: A Guide With Examples in R
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In this guide, we review neural embedding models and compare three methods of quantifying psychological constructs for use with embeddings: Distributed Dictionary Representation (DDR), Contextualized Construct Representation (CCR), and a novel approach: Correlational Anchored Vectors (CAV). We aim to cultivate an intuition for the geometric properties of neural embeddings, and a sensitivity to methodological problems that can arise in their use. We argue that while LLM embeddings have the advantage of contextualization, decontextualized word embeddings may have more ability to generalize across text genres when using cosine or dot product similarity metrics. The three methods of operationalizing psychological constructs in vector space likewise each have their advantages in particular applications. We recommend DDR, which derives a vector representation from a word list, for quantifying abstract constructs relating to the overall feel of a text, especially when the research requires that these constructs generalize across multiple genres of text. We recommend CCR, which derives a representation from a questionnaire, for cases in which texts are relatively similar in content to the embedded questionnaire, such as experiments in which participants are asked to respond to a related prompt. CAV, which derives a representation from labeled examples, requires suitably large and reliable training data.