Using large language models as a source of human behavioral data in social science experiments

Austin van Loon
Klint Kanopka

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) have prompted proposals to replace human subjects in social science experiments with simulated responses. Empirical evaluations suggest that this practice---often called silicon sampling---can sometimes approximate human behavior but is unreliable. We delineate where this approach may still provide value and where it may not, but primarily study an alternative approach: one in which model-based predictions are used not as substitutes for human data, but as auxiliary measurements within randomized experiments. We formalize the inference of causal estimands from mixed-subjects randomized controlled trials, in which outcomes are observed for a subset of units while predictions are available for all units. Under transparent design conditions, we derive a family of estimators that remain unbiased for the average treatment effect in finite samples while exploiting predictions to reduce variance. We characterize when prediction-powered, calibration-based, arm-specifically tuned, and difference-in-predictions estimators improve precision, and we provide a software package which operationalizes these results and aids researchers to jointly select estimators and allocate budgets between human data collection and prediction generation. Together, our results show how generative artificial intelligence can improve experimental social science without compromising scientific validity.

Version published to 10.31235/osf.io/y74mu_v1 on OSF Preprints
Apr 3, 2026

LLMs in the Lab: Can AI Predict What Real Participants Do?

This article has 2 authors:
1. Gafari LUKUMON
2. Ebenezer ESENOGHO
This article has no evaluationsLatest version Apr 8, 2026
Can Large Language Models Emulate Human Performance on Educational Assessments?

This article has 4 authors:
1. Xiuxiu Tang
2. Yikai Lu
3. John T. Behrens
4. Ying Cheng
This article has no evaluationsLatest version Apr 23, 2026
AI in the Experimental Loop: Implications for Replicability in Social Sciences

This article has 7 authors:
1. Antonio Aquino
2. Fabio Aurelio D'Asaro
3. Rocco Gaudenzi
4. Marco Lezcano
5. Vittorio Iacovella
6. Michela Vezzoli
7. Michele Scandola
This article has no evaluationsLatest version Mar 31, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

LLMs in the Lab: Can AI Predict What Real Participants Do?

Can Large Language Models Emulate Human Performance on Educational Assessments?

AI in the Experimental Loop: Implications for Replicability in Social Sciences