Moral Stereotyping in Large Language Models

Aliah Zewail
Alexandra Figueroa
Jesse Graham
Mohammad Atari

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Can Large Language Models (LLMs) accurately estimate various societies’ moral values? Here, we query the perceptions of the GPT family of LLMs for the “average” person from 48 countries and compare them to a large-scale (n = 93,198) survey of six moral values (Care, Equality, Proportionality, Loyalty, Authority, and Purity) from those countries. Our findings indicate that LLMs poorly capture the moral diversity around the globe, systematically overestimating some moral values (especially Care) and underestimating others (especially Purity). Notably, examining various versions of GPT shows that these LLMs may overestimate the overall moral concerns of some Western countries (e.g., United States, Canada, and Australia) while underestimating those of non-Western countries (e.g., Nigeria, Morocco, and Indonesia). Our work reveals that LLMs are inaccurate generators of cross-cultural estimations in the moral domain; in other words, they stereotype the moral values of cultural populations in predictable ways. Our results highlight the ethical and epistemic risks of relying on LLMs to estimate the endorsement of moral values around the globe.

Version published to 10.31234/osf.io/t9x8r on OSF Preprints
Dec 13, 2024

Listed in

Abstract

Article activity feed