Development of a Multilingual Lexicon Based on Sentiment Analysis for Low-Resource Languages

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The multilingual landscape of South Africa and the Democratic Republic of Congo (DRC) presents considerable challenges for multilingual translation due to the scarcity of accurately labeled datasets. Existing approaches, based on monolingual datasets and machine translation methods, often fail to address mixed-language contexts and nuances of sentiment polarity. This study aims to address these gaps by developing a multilingual lexicon initially designed for French, now enriched with translations and sentiment scores for English, Afrikaans, Sepedi, and Zulu. A corpus of 3,000 words and 1,000 sentences was created, and machine learning techniques such as random forests, support vector machines (SVM), decision trees, and the Naive Bayes classifier were applied to the lexicon. Furthermore, the study leverages a transformer-based model achieving remarkable performance with 99% precision and 98% accuracy in contextual sentiment prediction. Explainable artificial intelligence (XAI) was integrated to clarify model predictions, thus improving confidence in multilingual translation. The results demonstrate the usefulness of the lexicon in improving low-resource language translation and sentiment analysis, laying the foundation for scalable AI solutions in linguistically diverse contexts.

Article activity feed