Synthetic personas distort the structure of human belief systems

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language models (LLMs) are increasingly used as synthetic survey respondents, yet it is unclear whether their belief-system structure matches that of real publics. We compare 28 LLMs to the 2024 General Social Survey (GSS) using 52 attitude items and demographic persona traits. We estimate polychoric correlation matrices and propagate uncertainty in the GSS via bootstrap resampling with multiple imputation. Constraint is measured by the variance share explained by the first principal component and by effective dependence, a determinant-based measure of global linear dependence. Across models, LLM personas exhibit substantially higher constraint than humans; conditioning on persona traits reduces constraint far more for LLMs, indicating greater demographic mediation. Projection onto a shared GSS basis further shows overemphasis of the leading dimension and missing secondary structure. These results caution against treating LLM personas as a reliable foundation for synthetic survey data generation.

Article activity feed