Using Large Language Models to Estimate Belief Strength in Reasoning

Jérémie Beucler
Zoe Alexandra Purcell
Lucie Charles
Wim De Neys

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accurately quantifying belief strength in heuristics-and-biases tasks is crucial yet methodologically challenging. In this paper, we introduce an automated method leveraging large language models (LLMs) to systematically measure and manipulate belief strength. We specifically tested this method in the widely used “lawyer-engineer” base-rate neglect task, in which stereotypical descriptions (e.g., someone enjoying mathematical puzzles) conflict with normative base-rate information (e.g., engineers represent a very small percentage of the sample). Using this approach, we created an open-access database containing over 100,000 unique items systematically varying in stereotype-driven belief strength. Validation studies demonstrate that our LLM-derived belief strength measure correlates strongly with human typicality ratings and robustly predicts human choices in a base-rate neglect task. Additionally, our method revealed substantial and previously unnoticed variability in stereotype-driven belief strength in popular base-rate items from existing research, underlining the need to control for this in future studies. We further highlight methodological improvements achievable by refining the LLM prompt, as well as ways to enhance cross-cultural validity. The database presented here serves as a powerful resource for researchers, facilitating rigorous, replicable, and theoretically precise experimental designs, as well as enabling advancements in cognitive and computational modeling of reasoning. To support its use, we provide the R package baserater, which allows researchers to access the database to apply or adapt the method to their own research.

Version published to 10.31234/osf.io/eqrfu_v2 on OSF Preprints
Nov 27, 2025
Version published to 10.31234/osf.io/eqrfu_v1 on OSF Preprints
Oct 15, 2025

A trade-off between reasoning ability and metacognitive sensitivity in large language models

This article has 5 authors:
1. Ruixin Sha
2. Conghui Sun
3. Chunliang Yang
4. Liang Luo
5. Xiao Hu
This article has no evaluationsLatest version Jan 6, 2026
Using Large Language Models to Explore and Predict Human Choice from Verbal Description

This article has 1 author:
1. Eyal Marantz
This article has no evaluationsLatest version Dec 17, 2025
Integrating Heuristics and Mental Sampling in Belief Updating

This article has 6 authors:
1. Yitong Lin
2. Jian-Qiao Zhu
3. Jake Spicer
4. Yun-Xiao Li
5. Nick Chater
6. Adam N Sanborn
This article has no evaluationsLatest version Jan 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A trade-off between reasoning ability and metacognitive sensitivity in large language models

Using Large Language Models to Explore and Predict Human Choice from Verbal Description

Integrating Heuristics and Mental Sampling in Belief Updating