Text Psychometrics: Assessing Psychological Constructs in Text Using Natural Language Processing

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Natural language processing (NLP) has advanced tremendously thanks to innovations in large language models (LLMs) and generative AI. However, when these technologies are used to assess psychological constructs in text, they are generally not evaluated for the types of validity, reliability, and standardization typically expected from questionnaires that use rating scales and diagnostic instruments. This study seeks to bridge this gap by demonstrating how to evaluate psychometric properties of text-based models, which we call Text Psychometrics.Here we provide an overview of different NLP methods, compare how they can address key constraints in psychological research (e.g., explainability and data privacy), and provide a framework for evaluating these models on a broader range of psychometric properties than is typically evaluated. We then carry out multiple studies to validate NLP models for psychological assessment from text. In study 1, we evaluate concurrent criterion validity by classifying thousands of text-based crisis counseling conversations and Reddit posts into different types of mental health issues. We use a wide array of methods, from traditional lexicons to open-source LLMs. We introduce a novel way to evaluate NLP models for content validity, the degree to which a test captures most expressions of a construct it should capture. Importantly, we find certain high-performing models fail to identify many obvious expressions of a construct; for example, one model that achieved a mean ROC AUC of 0.82 failed to detect 49% of obvious expressions, on average. It is likely other NLP models used in the field for psychological assessment could suffer from the same lack of content validity, but it is rarely evaluated. In study 2, we evaluate prospective criterion validity and estimate how 49 known suicide risk factors are associated with imminent risk in crisis counseling conversations. We find that mentioning lethal means for suicide, the police, depressed mood, and anxiety in text as well as the texter writing more than the counselor characterize the conversations that end up needing an emergency service intervention for imminent suicide risk. This work extends findings from retrospective surveys by analyzing text data in an ecological, high-risk setting as it occurs in real time.In sum, we provide a blueprint for assessing and validating constructs in text through a variety of traditional and state-of-the-art methods that can be chosen depending on the particular constraints researchers may have. While most NLP studies in psychology rely on a single type of metric to validate their models, here we demonstrate why we should validate models more broadly when using NLP and LLMs in psychological research.

Article activity feed