Rethinking Think-Aloud in the Age of Language Models

Hanbo Xie
Hua-Dong Xiong
Robert C Wilson

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Understanding the human mind has long been a central goal of cognitive science, yet behavioral measures such as button presses provide only indirect traces of thought. Think-Aloud protocols, where participants verbalize their reasoning, offer a more direct probe but have long been criticized as unreliable and difficult to scale. Here we revisit this debate using one of the largest Think-Aloud datasets to date—44,038 paired transcripts and choices in risky decision-making. Our results make three contributions. First, concurrent Think-Aloud analyzed with large language models (LLMs) surpasses classic cognitive models and neural networks in predicting behavior, showing that verbal reports contain generative information beyond behavior alone. Second, Think-Aloud generalizes across individuals with only a few examples, reflecting shared cognitive patterns. Third, Think-Aloud guides LLMs toward more human-like reasoning. Together, these findings position Think-Aloud as a scalable complement to behavioral measures, reopening a foundational debate in cognitive science and informing human-aligned AI.

Version published to 10.31234/osf.io/6ta3z_v1 on OSF Preprints
Aug 31, 2025

Six fallacies in substituting large language models for human participants

This article has 1 author:
1. Zhicheng Lin
This article has no evaluationsLatest version Aug 21, 2025
Centaur May Have Learned a Shortcut that Explains Away Psychological Tasks

This article has 2 authors:
1. Hanbo Xie
2. Jian-Qiao Zhu
This article has no evaluationsLatest version Jul 12, 2025
Large language models accurately identify decision reasons in verbal reports

This article has 3 authors:
1. Kamil Fuławka
2. Ralph Hertwig
3. Dirk Wulff
This article has no evaluationsLatest version Aug 27, 2025

Listed in

Abstract

Article activity feed

Related articles

Six fallacies in substituting large language models for human participants

Centaur May Have Learned a Shortcut that Explains Away Psychological Tasks

Large language models accurately identify decision reasons in verbal reports