Detection and Analysis of Offensive Online Content in Hausa Language

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Hausa, a major Chadic language spoken by over 100 million people in Africa, faces a challenge in the digital age. While widely used, it is considered a low-resource language from a computational linguistic perspective. This means there are limited resources and tools to analyse Hausa text, making it difficult to detect offensive and threatening language online. Our study aimed to bridge this gap. We conducted two user studies (n = 180) to understand cyberbullying in Hausa. We then created the first-ever dataset of offensive and threatening Hausa phrases to train detection systems. We developed a system to flag such content and compared it to Google translation’s ability to detect these terms. Our findings revealed a concerning trend: offensive and threatening language is prevalent online, especially in discussions about religion and politics. Our detection system was able to detect more than 70% of offensive and threatening content, although many of these were mistranslated by Google’s translation engine. We attribute this to the subtle relationship between offensive and threatening content and idiomatic expressions in the Hausa language. This highlights the importance of considering cultural nuances and idiomatic expressions in Hausa. To create a safer online environment for Hausa speakers, we recommend involving diverse stakeholders who understand local contexts and demographics. This will allow for the development of more accurate detection systems and targeted moderation strategies. Trigger Warning: Readers may find some of the terms in this study distressing or disturbing; all examples are for illustration only.

Article activity feed