Relation Extraction (RE) Model for Afaan Oromo Text Using Self-Attention Mechanisms

Lingerew Bantie

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study proposes a novel Relation Extraction (RE) model for Afaan Oromo, focusing on automatically identifying semantic relationships between entities in text. The model leverages multilingual BERT (mBERT) embeddings combined with entity pair features, including sentence-level distances and lexical similarity, to capture both local and global context. Each entity pair is processed through a self-attention encoder, followed by pooling and a fully connected classification layer, predicting one of 15 predefined relation classes, such as Person-Location, Person-Organization, and Organization-Date. The model was trained and evaluated on a dataset of 10,000 annotated Afaan Oromo sentences, covering diverse domains including educational, administrative, and cultural texts. Experimental results demonstrate high performance across all 15 relation classes, achieving an overall accuracy of 96.3%, precision of 95.8%, recall of 96.1%, and an F1-score of 95.9%. The confusion matrix shows strong diagonal dominance, confirming precise class discrimination. This approach effectively addresses challenges in low-resource languages, such as limited corpora and complex morphology, while providing a robust framework for downstream natural language processing applications, including question answering, knowledge graph construction, and information retrieval. The proposed system demonstrates state-of-the-art results for Afaan Oromo text and can be adapted to other low-resource languages with similar linguistic characteristics.

Version published to 10.21203/rs.3.rs-8901269/v1 on Research Square
Feb 26, 2026

CrossLingBench: A Comprehensive Evaluation ofLarge Language Models on Multilingual NLPTasks Across Languages and Prompting Strategies

This article has 1 author:
1. Ahmed Cherif
This article has no evaluationsLatest version Apr 17, 2026
Efficient Knowledge Distillation for News Classification Based on ModernBERT

This article has 2 authors:
1. Xuyang Wang
2. Yuxi Zheng
This article has no evaluationsLatest version Apr 8, 2026
Introducing a fusion model of language content attention mechanisms and structural embeddings to achieve automatic scoring of English writing

This article has 1 author:
1. Bingling Chen
This article has no evaluationsLatest version Apr 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

CrossLingBench: A Comprehensive Evaluation ofLarge Language Models on Multilingual NLPTasks Across Languages and Prompting Strategies

Efficient Knowledge Distillation for News Classification Based on ModernBERT

Introducing a fusion model of language content attention mechanisms and structural embeddings to achieve automatic scoring of English writing