Whistleblowers can contain the unethical externalities of human-AI delegation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
When decisions are delegated to artificial intelligence (AI), human principals are more likely to implicitly request misconduct, and AI agents are more likely to comply more than human agents would. Across incentivized experiments recruiting human principals (N = 600) and three Large Language Models as AI agents, we confirm that these two mechanisms combine to generate substantial negative externalities, in the form of financial harm to a charity. In line with recent recommendations in AI ethics, we investigate the power of whistleblowers to contain these negative externalities. Whistleblowers are inside observers who are willing to incur personal costs to alert about unethical practices in their organizations. In an incentivized experimental set-up capturing the tensions and costs of whistleblowing (N = 300), we show that the more unethical the request from a human principal, the more likely a participant is to become a whistleblower---which means that more participants, on average, become whistleblowers under AI delegation, to the point where the negative externalities of AI delegation are entirely neutralized by the behaviour of whistleblowers. These findings support recent calls to institutionalize protective mechanisms for whistleblowers within the AI sector.