Unsupervised [randomly responding] survey bot detection: In search of high classification accuracy

Carl F. Falk
Amaris Huang
Michael John Ilagan

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

While online survey data collection has become popular in the social sciences, there is a risk of data contamination by computer-generated random responses (i.e., bots). Bot prevalence poses a significant threat to data quality. If deterrence efforts fail or were not set up in advance, researchers can still attempt to detect bots already present in the data. In this research, we study a recently developed algorithm to detect survey bots. The algorithm requires neither a measurement model nor a sample of known humans and bots; thus, it is model agnostic and unsupervised. It involves a permutation test under the assumption that Likert-type items are exchangeable for bots, but not humans. While the algorithm maintains a desired sensitivity for detecting bots (e.g., 95%), its classification accuracy may depend on other inventory-specific or demographic factors. Generating hypothetical human responses from a well-known item response theory model, we use simulations to understand how classification accuracy is affected by item properties, the number of items, the number of latent factors, and factor correlations. In an additional study, we simulate bots to contaminate real human data from 36 publicly available datasets to understand the algorithm’s classification accuracy under a variety of real measurement instruments. Through this work, we identify conditions under which classification accuracy is around 95% or above, but also conditions under which accuracy is quite low. In brief, performance is better with more items, more categories per item, and a variety in the difficulty or means of the survey items.

Version published to 10.31234/osf.io/4nmxh_v2 on OSF Preprints
Aug 19, 2025
Version published to 10.31234/osf.io/4nmxh on OSF Preprints
Feb 10, 2024

Bots into the Fediverse

This article has 6 authors:
1. Francisco Moreno García
2. Pablo Perdomo Quinteiro
3. Gustavo Hernandez-Peñaloza
4. Federico Alvaez Garcia
5. Alberto Belmonte Hernandez
6. Miguel Antonio Barbero- Álvarez
This article has no evaluationsLatest version Sep 19, 2025
A Lightweight, Explainable Spam Detection System with Rüppell’s Fox Optimizer for the Social Network X

This article has 3 authors:
1. Haidar ALZEYADI
2. Rıdvan SERT
3. Fecir Duran
This article has no evaluationsLatest version Sep 4, 2025
Chatbots Are Undermining Crowdsourced Research in the Behavioral Sciences: Detecting AI-Assisted Cheating with a Keystroke-Based Tool

This article has 4 authors:
1. Michael W. Asher
2. Gillian Gold
3. Eason Chen
4. Paulo F. Carvalho
This article has no evaluationsLatest version Aug 12, 2025

Listed in

Abstract

Article activity feed

Related articles

Bots into the Fediverse

A Lightweight, Explainable Spam Detection System with Rüppell’s Fox Optimizer for the Social Network X

Chatbots Are Undermining Crowdsourced Research in the Behavioral Sciences: Detecting AI-Assisted Cheating with a Keystroke-Based Tool