Rethinking Think-Aloud in the Age of Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Understanding the human mind has long been a central goal of cognitive science, yet behavioral measures such as button presses provide only indirect traces of thought. Think-Aloud protocols, where participants verbalize their reasoning, offer a more direct probe but have long been criticized as unreliable and difficult to scale. Here we revisit this debate using one of the largest Think-Aloud datasets to date—44,038 paired transcripts and choices in risky decision-making. Our results make three contributions. First, concurrent Think-Aloud analyzed with large language models (LLMs) surpasses classic cognitive models and neural networks in predicting behavior, showing that verbal reports contain generative information beyond behavior alone. Second, Think-Aloud generalizes across individuals with only a few examples, reflecting shared cognitive patterns. Third, Think-Aloud guides LLMs toward more human-like reasoning. Together, these findings position Think-Aloud as a scalable complement to behavioral measures, reopening a foundational debate in cognitive science and informing human-aligned AI.