A scale for detecting LLM-generated responses in online survey research
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rise of LLM-generated respondents poses a growing threat to the integrity of online survey research, yet available detection techniques remain opaque, proprietary, or poorly matched to the problem. In the present project, we propose and evaluate ECLAIR (Exploiting Common Limitations of AI Respondents), a novel framework for developing AI-detection items. We generated synthetic respondents using six LLMs (NTotal = 1,800) and compared them to census-matched samples of human participants recruited from five online data-collection platforms (NTotal = 1,518). Using a set of common tasks and scales drawn from the social sciences (e.g., the Trolley Problem; the Cognitive Reflection Test), we found that response distributions produced by synthetic respondents and human respondents diverge substantially, such that a researcher who relied solely on the former would be liable to misestimate experimental effects and systematically underestimate individual differences. Critically, we also found that a set of 22 items designed using the ECLAIR framework could successfully discriminate between the two types of respondents (AUC = .990), correctly classifying 98.78% of synthetic respondents and 95.78% of human respondents. A cross-validated short-form of four items preserved much of this accuracy (AUC = .980), correctly classifying 97.78% of synthetic respondents and 96.05% of human respondents. Although this project demonstrates that synthetic respondents pose a serious threat to online survey research, it also shows that novel tools, such as the ECLAIR framework, can help mitigate that threat.