Six fallacies in substituting large language models for human participants

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Can AI systems like large language models (LLMs) replace human participants in behavioral and psychological research? Here I critically evaluate the “replacement” perspective and identify six interpretive fallacies that undermine its validity. These fallacies are: (1) equating token prediction with human intelligence, (2) treating LLMs as the average human, (3) interpreting alignment as explanation, (4) anthropomorphizing AI systems, (5) essentializing identities, and (6) substituting model data for human evidence. Each fallacy represents a potential misunderstanding about what LLMs are and what they can tell us about human cognition. The analysis distinguishes levels of similarity between LLMs and humans, particularly functional equivalence (outputs) versus mechanistic equivalence (processes), while highlighting both technical limitations (addressable through engineering) and conceptual limitations (arising from fundamental differences between statistical and biological intelligence). For each fallacy, specific safeguards are provided to guide responsible research practices. Ultimately, the analysis supports conceptualizing LLMs as pragmatic simulation tools—useful for role-play, rapid hypothesis testing, and computational modeling (provided their outputs are validated against human data)—rather than as replacements for human participants. This framework enables researchers to leverage language models productively while respecting the fundamental differences between machine intelligence and human thought.

Article activity feed