Bias In, Symbolic Compliance Out? GPT’s Reliance on Gender and Race in Strategic Evaluations
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Strategic decision-making often involves more candidates than can be thoroughly assessed, leading evaluators to rely on proxies like gender and race, disadvantaging underrepresented minorities (URMs). As large language models (LLMs) like OpenAI’s ChatGPT become increasingly adopted by organizations, we ask whether and how LLMs rely on gender and race in evaluations. Across 26,000 evaluations of innovative offerings (e.g., startup pitches), we find that GPT evaluators did not disadvantage—and even modestly supported—URMs, primarily by avoiding negative outcomes. We theorize that this reflects symbolic compliance: A superficial response to avoid overt discrimination rather than a genuine commitment to fairness. We test this mechanism through “Second Opinion” experiments, where LLMs evaluate alongside simulated human inputs. This study highlights the implications of LLM adoption in strategic evaluations.