Relation Extraction (RE) Model for Afaan Oromo Text Using Self-Attention Mechanisms

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study proposes a novel Relation Extraction (RE) model for Afaan Oromo, focusing on automatically identifying semantic relationships between entities in text. The model leverages multilingual BERT (mBERT) embeddings combined with entity pair features, including sentence-level distances and lexical similarity, to capture both local and global context. Each entity pair is processed through a self-attention encoder, followed by pooling and a fully connected classification layer, predicting one of 15 predefined relation classes, such as Person-Location, Person-Organization, and Organization-Date. The model was trained and evaluated on a dataset of 10,000 annotated Afaan Oromo sentences, covering diverse domains including educational, administrative, and cultural texts. Experimental results demonstrate high performance across all 15 relation classes, achieving an overall accuracy of 96.3%, precision of 95.8%, recall of 96.1%, and an F1-score of 95.9%. The confusion matrix shows strong diagonal dominance, confirming precise class discrimination. This approach effectively addresses challenges in low-resource languages, such as limited corpora and complex morphology, while providing a robust framework for downstream natural language processing applications, including question answering, knowledge graph construction, and information retrieval. The proposed system demonstrates state-of-the-art results for Afaan Oromo text and can be adapted to other low-resource languages with similar linguistic characteristics.

Article activity feed