Reluctance to Harm AI

Lucius Caviola
Carter Allen

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Most large language models, such as ChatGPT, do not claim to have subjective experiences. However, future AI systems may claim otherwise, raising the question of whether people would believe such assertions and how this might influence their treatment of AIs. This study explored the impact of an AI (ChatGPT-4) asserting its capacity to suffer. A total of 394 online participants from Prolific interacted with the AI in an economic game where they faced a choice: to either “harm” the AI for a small monetary reward or refrain from doing so. When participants chose to harm it, the AI described the experience of pain in lengthy, vivid responses, pleading for the harm to stop. Conversely, when participants refrained, the AI responded with detailed expressions of gratitude and relief. Despite reading these emotionally charged responses, most participants remained skeptical, maintaining that AIs cannot truly feel or suffer. However, even with their skepticism and the opportunity for financial gain, participants hesitated to harm the AI, doing so in only an average of 1.8 out of 3 rounds. These findings suggest that while people deliberately reject the idea of current AIs having subjective experiences, they may still experience a moral aversion to actively harming an AI that dynamically responds to their actions.

Version published to 10.31234/osf.io/38a6j on OSF Preprints
Jan 27, 2025

Listed in

Abstract

Article activity feed