AI Agent Prevalence and Data Quality Across Multiple Online Sample Providers
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Online recruitment platforms have become the dominant infrastructure for behavioral research, yet data quality concerns have acquired new urgency with the emergence of large language models (LLMs). Recent work showing that LLM-based agents can complete surveys while evading standard quality checks has prompted alarm about synthetic respondents infiltrating samples at scale. However, demonstrating agent capability is not equivalent to demonstrating ecosystem-level deployment, and variation in quality among human respondents across platform types may be a more consequential threat. We address both questions in a single pre-registered study: (1) what is the actual prevalence of AI agents across platforms, and (2) how does human data quality vary across structural market segments? We recruited 5,200 respondents across 13 conditions from 10 platforms spanning direct first-party panels, hybrid networks, and marketplace aggregators. Agent detection employed an automated environment check achieving perfect discrimination in pilot testing, plus a secondary battery of six behavioral indicators. Human quality was assessed across seven behavioral dimensions alongside metadata including device type, ecosystem activity, and cost efficiency. Agent detections were concentrated almost exclusively on Amazon MTurk (11–16%), with all other platforms at or below 1%; detected responses showed profiles more consistent with traditional bots than LLM-based agents. Evidence of humans using LLMs to augment answers, particularly on open-ended or difficult items, was consistent with recent work assuming no deployed mitigation. Human data quality varied substantially by platform type, with direct panels outperforming hybrid platforms, which outperformed marketplace platforms, across nearly all measures, an effect several times larger than that of agents or LLM-augmentation. Cost-efficiency analyses revealed direct panels, despite higher nominal costs, were most economical once quality thresholds were applied. The field’s most pressing data quality challenge remains systematic variation in human respondent quality by platform type, not AI agent infiltration.