Multiple Methods for Visualizing Human Language: A Tutorial for Social and Behavioural Scientists
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Humans use language to communicate psychological experiences. Advancements in Natural LanguageProcessing have enabled researchers to quantitatively study and assess psychological states andbehaviors through language (e.g., well-being and mental health). Insights from language analyses areoften best conveyed through visualizations. Data-driven visualizations of how language statisticallyrelates to psychological dimensions can help interpret assessment scores, uncover meaningful patterns,and understand the intricate relationships between language use and mental states. Visualizations canbe used to differentiate between related constructs (e.g., How does language related to depression versusanxiety differ?), understand the validity of assessment tools (e.g., Does the language related to highdepression severity match with depression theory?), and explain language-based assessments (e.g.,Which language is indicative of higher depression severity in a language-based assessment?).This tutorial demonstrates how to transform language data into visual insights using the text and topicspackages in R. We provide practical guidance on creating four types of visualizations based on differentlanguage analysis techniques, examining how language use patterns are statistically associated with apsychological construct. We visualize 1) individual words and phrases (n-grams), 2) topics (wordclusters), 3) words in the word embedding space, and 4) relevant language examples. These differentvisualizations offer complementary perspectives, visualizing linguistic elements from individual wordsto overarching topics and text examples. The underlying methods range from correlations betweenindividual words to more advanced techniques, including topic modeling and Large Language Models.By making these methods accessible, this tutorial aims to empower researchers to use languagevisualizations for meaningful, data-driven insights.