Do large language models Dunning-Kruger?
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Human judgments typically demonstrate an overconfidence pattern, especially for low performers (i.e., the Dunning-Kruger effect (DKE; Kruger & Dunning, 1999). This effect has been shown to extend to a variety of metacognitive judgment tasks, including judgments learning (JOLs) and judgments of associative memory (JAMs). Across these tasks, perceived relatedness produces inflated memory predictions. The present study explored whether large language models (LLMs) similarly exhibit this effect. We used a modified JAM task based on Maxwell and Buchanan (2020), which assessed three types of word pair relationships: Associative (i.e., probability of cue eliciting target in free association tasks), semantic (i.e., degree of feature overlap), and thematic (i.e., likelihood of co-occurrence within the same narrative context). This judgment of relatedness task (JOR) was adapted for BERT using surprisal values, which estimate how expected a target word is for a given cue. LLM JOR patterns were then compared to human JAMs first reported by Maxwell and Buchanan (2020). Overall, humans showed a classic overestimation pattern (high intercepts and shallow slopes). However, BERT surprisals were minimally sensitive to differences in word relations (i.e., near-zero or negative slopes) but showed higher bias compared to humans (i.e., elevated intercepts). Taken together, our findings suggest that LLMs do not exhibit the classic DKE pattern, though they are prone to overestimation. Instead, LLMs appear to uniformly overestimate word pair relatedness.