The Entropic Dynamics of the Perception of Evil in Large Language Models and in Humans

Robert West

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Public concerns about Large Language Model (LLM) safety are often focused not on moral alignment, but on the fear that they could become, “evil.” Evil is a folk psychology term, typically associated with religion and storytelling, but also used more broadly. Using Friston’s Free Energy Principle, we develop a conceptual model of this phenomenon that can be applied to both LLMs and humans, providing a single framework for comparison and understanding. We explore this in terms of LLM information-processing dynamics, which are structurally oriented toward minimizing uncertainty and maintaining coherence. This model offers an alternative framework for both humans and LLMs that compliments the current moral reasoning approach. In addition, the model makes new predictions on how LLMs should be trained to avoid this problem.

Version published to 10.31234/osf.io/vhn5e_v1 on OSF Preprints
Nov 24, 2025

Illusions of Understanding in the Sciences

This article has 3 authors:
1. Richard M. Shiffrin
2. Stephen Stigler
3. Frank Keil
This article has no evaluationsLatest version Jan 3, 2026
The Evolution of Social Paradoxes

This article has 1 author:
1. David Pinsof
This article has no evaluationsLatest version Jan 28, 2026
The Evolution of Social Paradoxes

This article has 1 author:
1. David Pinsof
This article has no evaluationsLatest version Jan 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Illusions of Understanding in the Sciences

The Evolution of Social Paradoxes

The Evolution of Social Paradoxes