Quantifying Social Media Narratives Using Large Language Models: A Scalable Workflow for Social Measurement
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Social science research faces an enduring tradeoff between the interpretive depth of qualitative methods and the statistical scale of quantitative approaches. This paper addresses this gap by introducing a scalable measurement workflow that transforms naturally occurring narrative text into reproducible quantitative variables using large language models (LLMs) as configurable and auditable coding instruments. We analyze 6,814 public Reddit posts from caregiving communities to extract multidimensional measures of caregiving burden, emotional sentiment, and selected demographic attributes.A key design feature of the framework is the inclusion of confidence scores for inferred attributes, which makes variation in inferability explicit within the measurement process. The resulting analyses show that LLM-assisted measurement can recover theoretically consistent patterns related to caregiving roles and life-course variation in burden, while also revealing structured differences in what information is narratively available for inference. In particular, demographic attributes vary sharply in inferability across posts, and analyses conditioned on higher-confidence inferences yield more pronounced and interpretable subgroup patterns.As a methodological contribution, the study offers a transparent pipeline for integrating inductive pattern extraction from narrative data with deductive subgroup comparison, helping to close the interpretive-inferential loop. More broadly, the proposed workflow demonstrates how generative AI can be used to scale the analysis of meaning-rich narratives while remaining complementary to established qualitative and quantitative methods.