LLMs in the Lab: Can AI Predict What Real Participants Do?

Gafari LUKUMON
Ebenezer ESENOGHO

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Can large language models (LLMs) simulate participant-level datasets from experimental designs such that their statistical properties, such as effect directions, magnitudes, and significance, align with those of actual human data? In this work, we tested whether LLMs can generate simulated datasets that reproduce the core findings of real randomized controlled trials (RCTs) using only the information provided in a study’s pre-registration. We assessed whether this alignment generalizes across different LLMs (ChatGPT, Gemini, Perplexity) and across distinct experimental domains, including a math reasoning task comparing student performance and a social judgment task. We found that LLM-simulated datasets mirrored the real data in effect direction and successfully recovered the original patterns of statistical significance. All models correctly reproduced the direction of human effects, though effect magnitudes varied by model, with Gemini consistently overestimating effects and Perplexity showing the closest alignment to human data. While LLMs cannot replace empirical studies, our study offers a powerful and flexible complement capable of accelerating idea testing, refining study designs, and probing the robustness of research findings before conducting real-world experiments.

Version published to 10.21203/rs.3.rs-8827610/v1 on Research Square
Apr 8, 2026

Using large language models as a source of human behavioral data in social science experiments

This article has 2 authors:
1. Austin van Loon
2. Klint Kanopka
This article has no evaluationsLatest version Apr 3, 2026
Can Large Language Models Emulate Human Performance on Educational Assessments?

This article has 4 authors:
1. Xiuxiu Tang
2. Yikai Lu
3. John T. Behrens
4. Ying Cheng
This article has no evaluationsLatest version Apr 23, 2026
Stochastic Surprise Signatures in Human EEG: A Reproducible Benchmark of Prediction-Error Models Using the ERP CORE Oddball Dataset

This article has 3 authors:
1. Brhanu Fentaw Znabu
2. Zohaib Atif
3. Pradeep devkota
This article has no evaluationsLatest version Apr 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Using large language models as a source of human behavioral data in social science experiments

Can Large Language Models Emulate Human Performance on Educational Assessments?

Stochastic Surprise Signatures in Human EEG: A Reproducible Benchmark of Prediction-Error Models Using the ERP CORE Oddball Dataset