Bias In, Symbolic Compliance Out? GPT’s Reliance on Gender and Race in Strategic Evaluations
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Organizations are increasingly using large language models (LLMs) to support strategic evaluations. We examine whether and how these systems rely on gender and race. We asked GPT to evaluate identical startup pitches varying only the founder’s name, shaping gender and race perceptions. Across 26,000 evaluations, GPT did not systematically assign lower scores to underrepresented minorities but avoided ranking them last without increasing winning likelihoods. To explain these patterns, we conducted “Second Opinion” experiments where GPT evaluated pitches alongside inputs simulating human bias. GPT more readily corrected explicit, identity-based bias than bias framed as neutral business critiques, with corrections limited in magnitude. We theorize these findings reflect symbolic compliance: LLMs suppress overt discrimination without substantively altering evaluative logic, allowing inequality to persist in AI-supported strategic evaluations.