Detection and Analysis of Offensive Online Content in Hausa Language

Fatima Muhammad Adam
Abubakar Yakubu Zandam
Isa Inuwa-Dutse

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Hausa, a major Chadic language spoken by over 100 million people in Africa, faces a challenge in the digital age. While widely used, it is considered a low-resource language from a computational linguistic perspective. This means there are limited resources and tools to analyse Hausa text, making it difficult to detect offensive and threatening language online. Our study aimed to bridge this gap. We conducted two user studies (n = 180) to understand cyberbullying in Hausa. We then created the first-ever dataset of offensive and threatening Hausa phrases to train detection systems. We developed a system to flag such content and compared it to Google translation’s ability to detect these terms. Our findings revealed a concerning trend: offensive and threatening language is prevalent online, especially in discussions about religion and politics. Our detection system was able to detect more than 70% of offensive and threatening content, although many of these were mistranslated by Google’s translation engine. We attribute this to the subtle relationship between offensive and threatening content and idiomatic expressions in the Hausa language. This highlights the importance of considering cultural nuances and idiomatic expressions in Hausa. To create a safer online environment for Hausa speakers, we recommend involving diverse stakeholders who understand local contexts and demographics. This will allow for the development of more accurate detection systems and targeted moderation strategies. Trigger Warning: Readers may find some of the terms in this study distressing or disturbing; all examples are for illustration only.

Version published to 10.21203/rs.3.rs-4266465/v2 on Research Square
Apr 26, 2024
Version published to 10.21203/rs.3.rs-4266465/v1 on Research Square
Apr 19, 2024

Vectorization and Sentiment Analysis of Arabizi Text

This article has 4 authors:
1. noha youssef
2. Sama Gouda
3. Farida Madkour
4. Mona Ibrahim
This article has no evaluationsLatest version Jan 19, 2026
Can large language models effectively reshape online implicit hate speech? An integrative modelling approach

This article has 6 authors:
1. Yinghui Huang
2. Qixia Feng
3. Hui Liu
4. Weiqing Li
5. Ying Ma
6. Zongkui Zhou
This article has no evaluationsLatest version Jan 14, 2026
Exploration of Large Language Models forGeotagging of Social Media Posts

This article has 2 authors:
1. Riwaz Udas
2. Richard Sinnott
This article has no evaluationsLatest version Feb 3, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Vectorization and Sentiment Analysis of Arabizi Text

Can large language models effectively reshape online implicit hate speech? An integrative modelling approach

Exploration of Large Language Models forGeotagging of Social Media Posts