Centaur May Have Learned a Shortcut that Explains Away Psychological Tasks

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In a recent landmark effort, an international collaboration of cognitive scientists produced Psych-101, the largest behavioral dataset on human cognition described in natural language, comprising over 10 million human decisions across 160 psychological experiments. Building on this resource, the authors fine-tuned a pretrained large language model (LLM) to predict human choices in these experiments, called Centaur. While Centaur demonstrates impressive predictive performance—especially compared to domain-specific cognitive models—we find that much of its advantage stems from leveraging sequential dependencies in human choices. Over-reliance on such dependencies risks marginalizing the task-driven mechanisms that are also central to explaining human behavior. By reanalyzing the original Centaur model through controlled experiments that isolate task information and choice history, we find that Centaur outperforms domain-specific cognitive models even when no psychological task is provided, yet underperforms in other tasks where choice history is removed. These findings suggest that Centaur may have learned a shortcut that is insensitive to psychological tasks.

Article activity feed