A Crisis of Unverifiable Data: IP Analyses May Be Key to Ensuring Data Integrity in Online Survey Research

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Online survey platforms are widely used across disciplines, yet researchers have limited ability to verify who participates in their studies. Because fraud is ever evolving, we argue that researchers’ willingness to entrust even the most reputable platforms with responsibility for data integrity raises the specter of a crisis of unverifiable data. We illustrate this threat by documenting data irregularities on Prolific Academic, a platform consistently regarded as more reliable than its competitors. Using IP analyses, Retrospective Study 1 (N=4,225) reveals a rapid increase in five indicators of potential fraud in Prolific data collected in Fall 2024–Winter 2025 – with up to ~40% flagged as suspicious. These issues, however, remained undetected by five established best practices for data integrity and quality. Retrospective Study 2 (N=4,624) shows that while these issues are not unique to Prolific, data integrity there was significantly lower than on CloudResearch Connect and Qualtrics Edge Panels during Winter 2024–Spring 2025. Together, these findings suggest that without IP analyses, a core assumption of empirical psychology – that samples draw upon their stated population of interest – can silently fail in online survey research. To advance data integrity controls, we introduce two IP-based tools: the ip2location.io R package for auditing data integrity in past studies; and a real-time IP filter for blocking suspicious participants in future studies, whose effectiveness we test in Prospective Study 3 (N=328). Finally, we discuss limitations of IP analyses, and institutional challenges to IP data transparency and reproducibility.

Article activity feed