The Entropic Dynamics of the Perception of Evil in Large Language Models and in Humans
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Public concerns about Large Language Model (LLM) safety are often focused not on moral alignment, but on the fear that they could become, “evil.” Evil is a folk psychology term, typically associated with religion and storytelling, but also used more broadly. Using Friston’s Free Energy Principle, we develop a conceptual model of this phenomenon that can be applied to both LLMs and humans, providing a single framework for comparison and understanding. We explore this in terms of LLM information-processing dynamics, which are structurally oriented toward minimizing uncertainty and maintaining coherence. This model offers an alternative framework for both humans and LLMs that compliments the current moral reasoning approach. In addition, the model makes new predictions on how LLMs should be trained to avoid this problem.