A Multi-Factor Evaluation of Fidelity in LLM-Based Cross-Cultural Surrogates

Da-Wei Zhang
Anjie Cao
Yitian Yang
Haolin Jin
Ling Chen
Kimberlee Weatherall
Huaming Chen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) have been proposed as surrogate participants for cross-cultural research, offering a scalable alternative to resource-intensive human data collection. Confidence in such simulations, however, depends on both the robustness of the human benchmarks and the methodological choices used to generate model responses. The present study employs a controlled experimental design to systematically evaluate the methodological fidelity of LLM-based cross-cultural simulation using a well-established and highly replicable cognitive paradigm: the taxonomic-thematic similarity task, which reliably distinguishes U.S. and Chinese samples. We implemented a 2 × 2 × 2 factorial design manipulating Prompt Framing (participant-level vs. distributional), Model Identity (single general-purpose model vs. culturally aligned models), and Model Quantity (single vs. multiple models). Across eight experimental configurations, we simulated 100 U.S. and 100 Chinese participants per condition, repeated over 30 independent runs, and benchmarked simulated outcomes against large-scale human data. Simulation fidelity was evaluated in terms of effect size alignment, reliability across replications, and the semantic coherence of model-generated explanations. Five configurations successfully reproduced the established China-U.S. cultural contrast, with one configuration closely matching the human effect magnitude. Distributional prompting and culturally aligned models consistently improved fidelity, whereas aggregating across heterogeneous models increased reliability but attenuated effect size. Semantic analyses further revealed culturally differentiated reasoning patterns in explanations, alongside occasional logical inconsistencies. These results indicate that LLMs can reproduce robust cross-cultural cognitive effects at the group level when methodological design is carefully controlled, while also identifying clear boundary conditions for their use as cross-cultural research surrogates.

Version published to 10.21203/rs.3.rs-8799167/v1 on Research Square
Feb 9, 2026

Synthetic Participants Generated by Large Language Models: A Systematic Literature Review

This article has 3 authors:
1. Eduard Kuric
2. Peter Demcak
3. Matus Krajcovic
This article has no evaluationsLatest version Mar 10, 2026
The Potential of a FrameVR.io Metaverse Environment to Enhance Cognitive Engagement, Reduce Digital Burnout, and Develop Creative Writing Skills

This article has 4 authors:
1. Mohamed Mekheimer
2. Zeinab Amin
3. Mohamed Hasan Khalaf
4. Eman Mahdy
This article has no evaluationsLatest version Feb 17, 2026
Uses and Misuses of Large Language Models in Qualitative Research

This article has 1 author:
1. Jonathan Ben-Menachem
This article has no evaluationsLatest version Mar 17, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Synthetic Participants Generated by Large Language Models: A Systematic Literature Review

The Potential of a FrameVR.io Metaverse Environment to Enhance Cognitive Engagement, Reduce Digital Burnout, and Develop Creative Writing Skills

Uses and Misuses of Large Language Models in Qualitative Research