Tracing Bias to Its Sources: A Word Embedding Audit of Racism in South African News Outlets

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Race bias in South African news is well documented, but existing research has treated the media as a collective entity, leaving open the question of which specific outlets drive these patterns. This study addresses that gap by tracing racial bias to its institutional sources using word embeddings. We trained an ensemble of 10 Word2Vec models on resamples drawn from a corpus of 27,140 COVID-19 vaccination news articles, with each resample comprising 3,900 articles across 39 outlets. Each outlet was embedded as a vector based on its language, and that vector’s association was measured with validated racial stereotype vocabularies. We first found that socioeconomic race bias from a prior study replicates in the present study and correlates strongly with South African human judgments. The outlet-level analysis in this study reveals that business and finance outlets are most strongly associated with White stereotype language, while metropolitan, community newspapers, and government media are most associated with Black stereotype language. All outlets showed meaningful associations with both stereotype sets, suggesting that racial bias is a feature of South African news language rather than the product of a few outliers. The method provides a scalable, replicable framework for auditing racial bias at the institutional level and informing targeted interventions in newsroom practice and media policy.

Article activity feed