Synthetic respondents and the illusion of human data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Online behavioral research assumes that coherent, internally consistent responses indicate useful human data. This assumption is now challenged by autonomous AI agents, which achieve 99.8% pass rates on attention checks while generating psychometrically sound, hypothesis-confirming data indistinguishable from careful human work. The contamination is already widespread—approximately one-third of survey participants report using AI assistance—and produces systematic bias that registers as signal rather than noise. Traditional detection methods fail because they target low-effort humans while AI produces high-effort patterns. Economic incentives are powerful, and the threat recursive: AI-mediated responses become training data for models that mediate future responses, creating feedback loops toward model-shaped consensus. The field requires contamination-aware statistical frameworks and stratified sampling by evidential role, treating online convenience samples as exploratory while demanding robust recruitment for confirmatory inference.

Article activity feed