Content Moderation in the Global South: A Comparative Study of Four Low-Resource Languages
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Over the past 18 months, the Center for Democracy and Technology (CDT) has been studying how content moderation systems operate across multiple regions in the Global South, with a focus on South Asia, North and East Africa, and South America. Our team studied four languages: the different Maghrebi Arabic Dialects (Elswah, 2024a), Kiswahili (Elswah, 2024b), Tamil (Bhatia & Elswah, 2025), and Quechua (Thakur, 2025). These languages and dialects are considered “low resource” due to the scarcity of training data available to develop equitable and accurate AI models for them. We did this through essential collaborations with regional civil society organizations in the Global South to help us understand the local dynamics of their digital environments. Content moderation remains an area that technology companies keep largely inaccessible to public scrutiny, except for the information they choose to disclose. Our findings significantly contribute to the scientific and policy communities’ understanding of content moderation and its challenges in the Global South. The data we present in this report also contributes to our understanding of the information environment in the Global South, which is understudied in current scholarship.