Do large language models Dunning-Kruger?

Nicholas P. Maxwell
Erin Michelle Buchanan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Human judgments typically demonstrate an overconfidence pattern, especially for low performers (i.e., the Dunning-Kruger effect (DKE; Kruger & Dunning, 1999). This effect has been shown to extend to a variety of metacognitive judgment tasks, including judgments learning (JOLs) and judgments of associative memory (JAMs). Across these tasks, perceived relatedness produces inflated memory predictions. The present study explored whether large language models (LLMs) similarly exhibit this effect. We used a modified JAM task based on Maxwell and Buchanan (2020), which assessed three types of word pair relationships: Associative (i.e., probability of cue eliciting target in free association tasks), semantic (i.e., degree of feature overlap), and thematic (i.e., likelihood of co-occurrence within the same narrative context). This judgment of relatedness task (JOR) was adapted for BERT using surprisal values, which estimate how expected a target word is for a given cue. LLM JOR patterns were then compared to human JAMs first reported by Maxwell and Buchanan (2020). Overall, humans showed a classic overestimation pattern (high intercepts and shallow slopes). However, BERT surprisals were minimally sensitive to differences in word relations (i.e., near-zero or negative slopes) but showed higher bias compared to humans (i.e., elevated intercepts). Taken together, our findings suggest that LLMs do not exhibit the classic DKE pattern, though they are prone to overestimation. Instead, LLMs appear to uniformly overestimate word pair relatedness.

Version published to 10.31234/osf.io/4naty_v1 on OSF Preprints
Oct 9, 2025

Contextual Assembly of Lexical Functions in Large Language Models

This article has 3 authors:
1. Chris Kello
2. Polyphony Bruna
3. Kanly Thao
This article has no evaluationsLatest version Sep 23, 2025
Large language models exhibit human-like sensitivity to framing manipulations

This article has 4 authors:
1. Joanna Chang
2. David Parker
3. Gyumin Shin
4. Ian Ballard
This article has no evaluationsLatest version Nov 7, 2025
Large language models exhibit human-like sensitivity to framing manipulations

This article has 4 authors:
1. Joanna Chang
2. David Parker
3. Gyumin Shin
4. Ian Ballard
This article has no evaluationsLatest version Nov 7, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Contextual Assembly of Lexical Functions in Large Language Models

Large language models exhibit human-like sensitivity to framing manipulations

Large language models exhibit human-like sensitivity to framing manipulations