Simulated Coherence, Absent Minds: On the Philosophical Illusions of AI Alignment

Mikołaj Sienicki
Krzysztof Sienicki

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Pizzochero and Dellaferrera (2025) have recently demonstrated that large language models (LLMs) are capable of emulating human philosophical viewpoints with remarkable fidelity. By instructing these models to simulate responses from distinct intellectual subpopulations, they find that LLMs reproduce answer distributions that closely mirror those of actual philosophers and scientists. Yet, this paper contends that such outputs represent simulation rather than introspection. Building on insights from AI alignment theory and our formal investigations into strategic obfuscation in scheming agents, we underscore the epistemic hazards of conflating linguistic fluency with genuine cognition. Concepts such as semantic encryption and epistemic adversariality illustrate how persuasive, coherent outputs may obscure rather than clarify the model’s alignment with human reasoning. Consequently, we argue that the deployment of LLMs in experimental philosophy and oversight contexts must be approached with critical rigor. In the absence of access to internal deliberative processes, behavioral mimicry should not be mistaken for philosophical comprehension. It is not enough that machines produce plausible answers; the deeper question remains whether these answers emerge from any meaningful cognitive substrate. The central challenge, then, is not to teach machines to speak like thinkers, but to determine whether thought lies behind the simulation.

Version published to 10.20944/preprints202507.1654.v1
Jul 21, 2025

Listed in

Abstract

Article activity feed