Automating Social Science: LLMs vs. Human Experts in Variable Relationship Identification

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Big data and computational methods are transforming the landscape of social science research, enabling the analysis of increasingly complex phenomena. However, the ability of human researchers to synthesize vast datasets and model intricate relationships is limited. Thus, this study evaluates the capacity of large language models (LLMs; Qwen 2.5, Llama 3.1, and GPT-4) to identify variable relationships in social psychology, assessing their reasoning capabilities relative to both domain experts and non-domain experts. Therefore, we collected 56 meta-analyses of social psychology published in 2024, from which we extracted 247 variable relationships. We tasked LLMs and human experts to infer variable relationships based on variable definitions and compared their inferences with relationships reported in the meta-analyses, while also examining the impact of task difficulty, self-reported confidence levels, and relationship type on model performance. Our findings indicated that LLMs and domain experts performed similarly in identifying simple variable relationships (e.g., linear relationships and difference tests). However, the identification of more complex relationships, particularly moderating effects, presented a challenge for both. Furthermore, domain expertise significantly enhanced identification accuracy. Although a correlation was observed between model confidence and accuracy, it was not a strong predictor. Increased task complexity consistently reduced the performance of all LLMs. This study empirically examines the reasoning capabilities of LLMs, suggesting potential roles and limitations of algorithmic tools in social science, providing new evidence for data-driven academic practices.

Article activity feed