A Bi-Directional Ge’ez-Amharic Neural Machine Translation: a Deep Learning Approach

Belete Tesfaye
Amdework Assefa

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Currently, due to globalization, our world is moving into one village and human languages are being transnational. So far, human interpreters have been resolving communication gaps between two people who speak different languages. However, since human translation is costly and inconvenient, many kinds of research are being done to resolve this problem with Machine Translation (MT) techniques. MT is a process of automatically translating text or speech from one human language to another by computers. Neural Machine Translation uses Artificial Neural Networks such as Transformers, which are the state of the art models that shows promising result over the previous MT models. Several ancient scripts written in the Ge’ez language that need to be translated are available in Ethiopia and abroad. Currently, youth and researchers are interested to learn and involve in research areas of Ge’ez and Amharic manuscripts. This thesis, therefore, aims to demonstrate the capabilities of deep learning algorithms on MT tasks for those morphologically rich languages. A bi-directional text-based Ge’ez-Amharic MT was tested on two main different deep learning models viz. Seq2Seq with attention, and Transformer. A total of 20,745 parallel corpora was used for the experiment, from which the 13,787 parallel sentences were collected from former researchers and a new 6958 parallel corpus was prepared. In addition, a Ge’ez Latin numeric corpus having 3,078 parallel lines has been added to handle numeric translation. We conducted four experiments, and the transformer outperforms other techniques by scoring 22.9 and 29.7 BLEU scores from Ge’ez to Amharic and vice versa using 20,745 parallel corpora. The typical Seq2Seq model improves the BLEU score of the SMT model, obtained by previous researchers with BLEU scores of +0.65 and +0.79 that is 2.46% and 4.66% increment from Ge’ez to Amharic and from Amharic to Ge’ez using 13,833 parallel sentences. Doing further researches with clean larger corpus size and pre-trained models may improve the result we have reported in this work. However, we faced a scarcity of corpus and pre-trained models for Amharic and Ge’ez languages to get better results.

Version published to 10.21203/rs.3.rs-7612303/v1 on Research Square
Nov 12, 2025

Neural Machine Translation and Multilingual NLP: A Survey of Methods, Architectures, and Applications

This article has 3 authors:
1. Yao Yuna
2. Junhao Song
3. Jing Qiao
This article has no evaluationsLatest version Jan 6, 2026
Construction of a cross-domain machime translation model based on meta-learing and semlantic transfer

This article has 1 author:
1. Yongjian Wang
This article has no evaluationsLatest version Jan 6, 2026
Advancing Dialectal Arabic to Modern Standard Arabic Machine Translation

This article has 3 authors:
1. Abdullah Alabdullah
2. Lifeng HAN
3. Chenghua Lin
This article has no evaluationsLatest version Jan 22, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Neural Machine Translation and Multilingual NLP: A Survey of Methods, Architectures, and Applications

Construction of a cross-domain machime translation model based on meta-learing and semlantic transfer

Advancing Dialectal Arabic to Modern Standard Arabic Machine Translation