A computational qualitative approach to large-scale characterization of cultural identities on social media

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Social groups and identities play a central role in social research, yet systematically identifying the symbolic markers that define group boundaries can be difficult, particularly in large, unlabeled textual corpora. This article advances an existing technique for computational lexicon extraction, the ``frequency-based'' method, to automatically identify cultural and social identity signals at scale, even in the absence of reliable labeled data. By leveraging a small set of highly representative ``seed'' terms to assign low-quality labels, the approach locates authors who share specialized vocabularies, thereby revealing dense patterns of boundary-specific words. In accordance with computational grounded theory (CGT), which stresses iterative refinement informed by interpretive insight, selecting appropriate seed terms requires considerable knowledge of the groups under observation prior to lexicon extraction. A companion study also published in this volume illustrates the technique’s effectiveness in mapping the symbolic boundaries of white nationalist discourse on Twitter, highlighting how automated lexicon extraction can feed directly into traditional qualitative practices such as close reading and manual coding. The discussion situates this mixed-methods approach within the framework of computational grounded theory while challenging the presupposition—implicit in CGT and computational social science more broadly—that the use of computational tools must culminate in quantitative analyses. Instead, this approach demonstrates how natural language processing can serve as an upstream step that enriches and expedites qualitative inquiry by revealing hidden patterns and vocabularies that might otherwise remain invisible to traditional ethnography. In this way, computational approaches to text are positioned not as replacements but as catalysts for deeper, more nuanced qualitative research.

Article activity feed