How Objective Source and Subjective Belief Shape the Detectability and Acceptability of LLMs’ Moral Judgments
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
As large language models (LLMs) are increasingly integrated into decision-making systems—such as autonomous vehicles and medical devices—understanding how humans perceive and evaluate AI-generated judgements is crucial. To investigate this, we conducted a series of experiments in which participants evaluated justifications for moral and non-moral choices, generated either by humans or LLMs. Participants attempted to identify the source of each justification (either human or LLM) and indicated their agreement with its content. We found that while detection accuracy was consistently above chance, it remained below 70%. In terms of agreement, there was no overall preference for human-generated responses, even though machine-generated justifications were favored in particularly challenging moral scenarios. Notably, we observed a systematic anti-AI bias: participants were less likely to agree with judgments they believed were AI-generated, regardless of the true source. Linguistic cues, such as response length, typos, first-person pronouns, and cost-benefit language markers (e.g., "lives," "save"), influenced both detection and agreement. Participants tended to disagree with cost-benefit calculations, possibly due to an expectation that AI would favor such reasoning. These findings highlight the influence of motivated belief and ingroup/outgroup bias in shaping human evaluation of AI-generated content, particularly in morally sensitive contexts.