Large language models outperform humans at estimating society's everyday norms—but hybrids are even better

Kimmo Eriksson
Simon Karlsson
Irina Vartanova
Pontus Strimling

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

As AI assistants and social robots enter human environments, their ability to navigate context-dependent social norms is essential to avoid harm and ensure successful collaboration. We evaluate six large language models (LLMs) on their ability to estimate American social norms across 555 everyday scenarios (measured in prior work) and compare these to estimates from 320 humans. LLMs achieve remarkably high accuracy, clearly outperforming the average human. However, the errors LLMs make are systematic; they are similar across runs of the same LLM and even across different LLMs. As a consequence of this homogeneity, aggregating estimates of LLMs produces little improvement. Individual humans make much worse estimates, often defaulting to extreme right-or-wrong judgments even when asked to estimate population averages, but their errors are idiosyncratic and, consequently, aggregating their estimates yields dramatic improvement through wisdom-of-crowds effects. As humans make different errors than LLMs, hybrid ensembles combining both substantially outperform either alone.

Version published to 10.21203/rs.3.rs-9161239/v1 on Research Square
Mar 30, 2026

Reasoning from social information enables efficient human intelligence

This article has 2 authors:
1. Manikya Alister
2. Andrew Perfors
This article has no evaluationsLatest version Mar 25, 2026
Collective problem decomposition improves the wisdom of deliberative crowds

This article has 7 authors:
1. Federico Barrera-Lemarchand
2. Laura Victoria Lescano Charreau
3. Julieta Ruiz
4. Nuria Caceres
5. Facundo Carrillo
6. Mariano Sigman
7. Joaquin Navajas
This article has no evaluationsLatest version May 5, 2026
Multimodal large language models converge on the human-like geometry of abstract emotion

This article has 7 authors:
1. Huiguang He
2. Changde Du
3. Yizhuo Lu
4. Zhongyu Huang
5. Yi Sun
6. Zisen Zhou
7. Shaozheng Qin
This article has no evaluationsLatest version Apr 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Reasoning from social information enables efficient human intelligence

Collective problem decomposition improves the wisdom of deliberative crowds

Multimodal large language models converge on the human-like geometry of abstract emotion