Stereotypical Bias Amplification, and Reversal, in an Experimental Model of Human Interaction with Generative AI

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Stereotypical biases are readily acquired and expressed by generative AI, causing growing societal concern about these systems amplifying existing human bias. This concern rests on reasonable psychological assumptions, but stereotypical bias amplification during human-AI interaction relative to pre-existing baseline levels hasn’t been demonstrated. Here, we use previous psychological work on gendered character traits to capture and control gender stereotypes expressed in character descriptions generated by Open AI’s ChatGPT. In four experiments (N=782) with a first impressions task we find that unexplained (‘black-box’) character recommendations using stereotypical traits already convey a potent persuasive influence significantly amplifying baseline stereotyping within first impressions. Recommendations that are counter-stereotypical eliminate and effectively reverse human baseline bias, but these stereotype challenging influences propagate less well than reinforcing influences from stereotypical recommendations. Critically, the bias amplification and reversal phenomena occur when ChatGPT elaborates on the core stereotypical content, although ChatGPT’s explanations propagate counter-stereotypical influence more effectively and persuasively than black-box recommendations. Our findings strongly imply that without robust safeguards generative AI will amplify existing bias. But with safeguards, existing bias can be eliminated and even reversed. Our novel approach safely allows such effects to be studied in various contexts where gender and other bias-inducing social stereotypes operate.

Article activity feed