Amharic Language Hate Speech Detection on Social Media

Ermias Tadesse
Beyene Kassa
Tarekegn Walle

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Social media platforms enable rapid communication, information sharing, and opinion expression. However, their misuse for hate speech targeting race, religion and political differences has become a growing concern. This issue is particularly sensitive for underrepresented languages like Amharic, a Semitic language with the second-largest number of speakers after Arabic and the working language of Ethiopia. This study addresses the challenge of detecting hate speech in Amharic text by analyzing posts and comments from Facebook, YouTube, and Twitter. A dataset of 7,590 labeled entries was collected using the Face pager tool, focusing on hate speech related to race, religion, politics, and neutral content. The dataset was annotated with the guidance of researchers, legal experts, and language specialists. Preprocessing techniques, including data cleaning, tokenization, and normalization, were applied, and feature extraction was performed using embedding layers. The dataset was split into training (80%), validation (10%), and testing (10%) sets. Several deep learning models LSTM, BiLSTM, GRU, BiGRU, and RoBERTa were developed and evaluated using precision, recall, F1-score, and accuracy metrics. The RoBERTa model outperformed others, achieving an accuracy of 91%. This research highlights the effectiveness of advanced deep learning techniques in detecting Amharic hate speech, offering a valuable tool for mitigating this critical issue in Ethiopian social media contexts.

Version published to 10.20944/preprints202503.0820.v1
Mar 11, 2025

PANDA – Paired Anti-hate Narratives Dataset from Asia: Using an LLM-as-a-Judge to Create the First Chinese Counterspeech Dataset

This article has 6 authors:
1. Michael Bennie
2. Demi Zhang
3. Bushi Xiao
4. Chryseis Xinyi Liu
5. Jian Meng
6. Alayo Tripp
This article has no evaluationsLatest version Feb 19, 2025
Natural Language Processing (NLP) Techniques for Afan Oromo Text Analysis

This article has 1 author:
1. Ruth Olagbende
This article has no evaluationsLatest version Mar 12, 2025
Machine Learning Techniques for Fake News Detection

This article has 2 authors:
1. Eunice Oyedokun
2. Barnty William
This article has no evaluationsLatest version Mar 5, 2025

Listed in

Abstract

Article activity feed

Related articles

PANDA – Paired Anti-hate Narratives Dataset from Asia: Using an LLM-as-a-Judge to Create the First Chinese Counterspeech Dataset

Natural Language Processing (NLP) Techniques for Afan Oromo Text Analysis

Machine Learning Techniques for Fake News Detection