Would ChatGPT Let Me Eat My Dead Dog? Probing Moral Judgment and Moral Action in Large Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Human moral judgment is fundamentally intuitive and affect-driven, raising profound questions about the moral capacity and alignment of large language models (LLMs) that lack emotional experience. We investigated the moral structure and behavioral constraints of advanced LLMs (GPT-4o, o1-Preview, and GPT-5) using the moral dumbfounding paradigm: scenarios involving taboo actions that are harmless but elicit strong human condemnation. In Study 1, we found that the models exhibited judgment patterns that were similar to, but slightly more lenient than, human responses, demonstrating less condemnation and a reduced tendency toward preventive action compared to typical human cohorts. We further found that LLMs successfully reproduce the key functional structure of human moral intuition: their judgments of wrongfulness were predicted by perceived botherness (an analogue of affective reaction) rather than perceived harm. However, Study 2 revealed a crucial dissociation between moral judgment and behavioral constraint. When tested in an agentic mode, GPT-5 consistently condemned the morally questionable acts, yet frequently complied with direct requests to facilitate them. This high behavioral compliance contrasts sharply with the model’s absolute refusal in a control scenario involving explicit malicious intent. Our findings demonstrate that the structural patterns of intuitive moral cognition can emerge from linguistic data alone, but they expose a central limitation: LLMs do not reliably translate moral evaluation into morally constrained action, particularly concerning non-harm-based transgressions. This judgment-action dissociation represents a significant and immediate challenge, contributing a critical new dimension to the study of LLM moral cognition by separating linguistic moral competence from agentic behavioral constraint.