Bias In, Symbolic Compliance Out? GPT’s Reliance on Gender and Race in Strategic Evaluations

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Organizations are increasingly using large language models (LLMs) to support strategic evaluations. We examine whether and how these systems rely on gender and race. We asked GPT to evaluate identical startup pitches varying only the founder’s name, shaping gender and race perceptions. Across 26,000 evaluations, GPT did not systematically assign lower scores to underrepresented minorities but avoided ranking them last without increasing winning likelihoods. To explain these patterns, we conducted “Second Opinion” experiments where GPT evaluated pitches alongside inputs simulating human bias. GPT more readily corrected explicit, identity-based bias than bias framed as neutral business critiques, with corrections limited in magnitude. We theorize these findings reflect symbolic compliance: LLMs suppress overt discrimination without substantively altering evaluative logic, allowing inequality to persist in AI-supported strategic evaluations.

Article activity feed