Chatbots Are Undermining Crowdsourced Research in the Behavioral Sciences: Detecting AI-Assisted Cheating with a Keystroke-Based Tool

Michael W. Asher
Gillian Gold
Eason Chen
Paulo F. Carvalho

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Generative AI poses a significant threat to data integrity on crowdsourcing platforms like Prolific, which behavioral scientists widely rely on for data collection. Large language models (LLMs) allow users to generate fluent and relevant responses to open-ended questions, which can mask inattention and compromise experimental validity. To empirically estimate the prevalence of this behavior, we analyzed keystroke data from three studies (N = 928) on Prolific between May and July 2025. Using an embedded JavaScript tool, we flagged participants who pasted text or whose keystroke count was anomalously low compared to their response length. For each flagged participant, we manually compared their detected keystrokes to their final response to determine if the text could have been plausibly typed. This process confirmed that, despite deterrence measures, approximately 9% of all participants submitted AI-assisted responses. These participants significantly outperformed non-cheaters (by up to 1.5 SDs), were over twice as likely to share geolocations with other participants (suggesting possible VPN or proxy use), and exhibited lower reliability on questionnaire scales. Simulated power analyses indicate that this level of undetected cheating can diminish observed effect sizes by 10% and inflate required sample sizes by as much as 30%. These findings highlight the urgent need for new detection methods like keystroke logging, which offers verifiable evidence of cheating that is difficult to obtain from manual review of LLM-generated text alone. As AI continues to evolve, maintaining data quality in crowdsourced research will require active monitoring, methodological adaptation, and communication between researchers and data collection platforms.

Version published to 10.31219/osf.io/a3pzx_v1 on OSF Preprints
Aug 12, 2025

Bots into the Fediverse

This article has 6 authors:
1. Francisco Moreno García
2. Pablo Perdomo Quinteiro
3. Gustavo Hernandez-Peñaloza
4. Federico Alvaez Garcia
5. Alberto Belmonte Hernandez
6. Miguel Antonio Barbero- Álvarez
This article has no evaluationsLatest version Sep 19, 2025
Automated Detection of Invalid Responses to Creativity Assessments

This article has 4 authors:
1. Antonio Laverghetta
2. Simone Lucchini
3. Jimmy Pronchick
4. Roger Beaty
This article has no evaluationsLatest version Sep 9, 2025
Unsupervised [randomly responding] survey bot detection: In search of high classification accuracy

This article has 3 authors:
1. Carl F. Falk
2. Amaris Huang
3. Michael John Ilagan
This article has no evaluationsLatest version Aug 19, 2025

Listed in

Abstract

Article activity feed

Related articles

Bots into the Fediverse

Automated Detection of Invalid Responses to Creativity Assessments

Unsupervised [randomly responding] survey bot detection: In search of high classification accuracy