Measuring complex psychological and sociological constructs in large-scale text

Alina Herderich
Jana Lasser
Mirta Galesic
Segun Taofeek Aroyehun
David Garcia
Joshua Garland

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In recent years, there has been an increasing exchange between social science and machine learning. In principle, natural language processing enables social scientists to systematically process large amounts of text, while rich domain knowledge helps machine learning scholars to build valid models of social phenomena. However, there is a lack of clear guidelines for constructing valid and reliable mixed methods approaches, which can increase the rigor and comparability of computational social science research. We provide a set of guidelines for leveraging human data annotation and automatic text classification at scale in five stages: (1) classification scheme development, (2) data labeling, (3) model selection, (4) model training and performance improvement, and (5) statistical analysis. Using examples from our own research on countering online hate, we outline potential problems and respective solutions. We demonstrate how consequently integrating expertise from social science and machine learning can enhance the study of diverse social phenomena.

Version published to 10.31234/osf.io/tzc9p_v2 on OSF Preprints
Dec 4, 2025
Version published to 10.31234/osf.io/tzc9p on OSF Preprints
Aug 1, 2024

Uses and Misuses of Large Language Models in Qualitative Research

This article has 1 author:
1. Jonathan Ben-Menachem
This article has no evaluationsLatest version Mar 17, 2026
Mapping the changing landscape of American experimental psychology through data-driven topic modeling.

This article has 1 author:
1. Hiroshi Matsui
This article has no evaluationsLatest version Mar 12, 2026
Artificial Intelligence - Partner in research, or the fifth wheel

This article has 4 authors:
1. Lejo Buning
2. Jos van Hilligersberg
3. Peter Schuur
4. Frans de Vijlder
This article has no evaluationsLatest version Mar 25, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Uses and Misuses of Large Language Models in Qualitative Research

Mapping the changing landscape of American experimental psychology through data-driven topic modeling.

Artificial Intelligence - Partner in research, or the fifth wheel