Measuring complex psychological and sociological constructs in large-scale text

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In recent years, there has been an increasing exchange between social and computational sciences. Methods from natural language processing enable social scientists to systematically process large amounts of text. Rich psycho-sociological domain knowledge helps machine learning scholars to build valid models. Greater methodological interdisciplinarity is needed to successfully implement mixed methods approaches. Our guidelines provide detailed, hands-on advice on leveraging human data annotation and automatic text classification at scale, an approach applicable from exploration and theory building to confirmatory hypothesis testing. We outline methodological considerations, potential problems, and respective solutions throughout the process. Using an example from our own research on countering online hate, we describe five stages: (1) classification scheme development, (2) data labeling, (3) model selection, (4) model training and performance improvement, and (5) statistical analysis. Our guidelines demonstrate how integrating expertise from social sciences and machine learning can enhance the study of diverse social phenomena.

Article activity feed