Dual-BERT Adversarial Model for Text Normalization in Hausa User-Generated Contents
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper presents an innovative Dual-BERT Generative Adversarial Networks framework aimed at improving text normalisation in low-resource languages, specifically Hausa. By harnessing the capabilities of Bidirectional Encoder Representations from Transformers (BERT) and Generative Adversarial Networks (GANs), the model surpasses conventional Transformer-based and standalone GAN models in Exact Match, Word Error Rate (WER), Character Error Rate (CER), and BLEU score. Experimental results indicate an Exact Match of 0.80 and a notable decrease in error rates across all metrics. This methodology enhances NLP tools for under-represented languages, particularly within noisy, informal textual contexts such as social media.