Rethinking Think-Aloud in the Age of Language Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Understanding the human mind has long been a central goal of cognitive science, yet behavioral measures such as button presses provide only indirect traces of thought. Think-Aloud protocols, where participants verbalize their reasoning, offer a more direct probe but have long been criticized as unreliable and difficult to scale. Here we revisit this debate using one of the largest Think-Aloud datasets to date—44,038 paired transcripts and choices in risky decision-making. Our results make three contributions. First, concurrent Think-Aloud analyzed with large language models (LLMs) surpasses classic cognitive models and neural networks in predicting behavior, showing that verbal reports contain generative information beyond behavior alone. Second, Think-Aloud generalizes across individuals with only a few examples, reflecting shared cognitive patterns. Third, Think-Aloud guides LLMs toward more human-like reasoning. Together, these findings position Think-Aloud as a scalable complement to behavioral measures, reopening a foundational debate in cognitive science and informing human-aligned AI.