A Multi-Factor Evaluation of Fidelity in LLM-Based Cross-Cultural Surrogates
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large language models (LLMs) have been proposed as surrogate participants for cross-cultural research, offering a scalable alternative to resource-intensive human data collection. Confidence in such simulations, however, depends on both the robustness of the human benchmarks and the methodological choices used to generate model responses. The present study employs a controlled experimental design to systematically evaluate the methodological fidelity of LLM-based cross-cultural simulation using a well-established and highly replicable cognitive paradigm: the taxonomic-thematic similarity task, which reliably distinguishes U.S. and Chinese samples. We implemented a 2 × 2 × 2 factorial design manipulating Prompt Framing (participant-level vs. distributional), Model Identity (single general-purpose model vs. culturally aligned models), and Model Quantity (single vs. multiple models). Across eight experimental configurations, we simulated 100 U.S. and 100 Chinese participants per condition, repeated over 30 independent runs, and benchmarked simulated outcomes against large-scale human data. Simulation fidelity was evaluated in terms of effect size alignment, reliability across replications, and the semantic coherence of model-generated explanations. Five configurations successfully reproduced the established China-U.S. cultural contrast, with one configuration closely matching the human effect magnitude. Distributional prompting and culturally aligned models consistently improved fidelity, whereas aggregating across heterogeneous models increased reliability but attenuated effect size. Semantic analyses further revealed culturally differentiated reasoning patterns in explanations, alongside occasional logical inconsistencies. These results indicate that LLMs can reproduce robust cross-cultural cognitive effects at the group level when methodological design is carefully controlled, while also identifying clear boundary conditions for their use as cross-cultural research surrogates.