Development of a Multilingual Lexicon Based on Sentiment Analysis for Low-Resource Languages

Mike Nkongolo

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The multilingual landscape of South Africa and the Democratic Republic of Congo (DRC) presents considerable challenges for multilingual translation due to the scarcity of accurately labeled datasets. Existing approaches, based on monolingual datasets and machine translation methods, often fail to address mixed-language contexts and nuances of sentiment polarity. This study aims to address these gaps by developing a multilingual lexicon initially designed for French, now enriched with translations and sentiment scores for English, Afrikaans, Sepedi, and Zulu. A corpus of 3,000 words and 1,000 sentences was created, and machine learning techniques such as random forests, support vector machines (SVM), decision trees, and the Naive Bayes classifier were applied to the lexicon. Furthermore, the study leverages a transformer-based model achieving remarkable performance with 99% precision and 98% accuracy in contextual sentiment prediction. Explainable artificial intelligence (XAI) was integrated to clarify model predictions, thus improving confidence in multilingual translation. The results demonstrate the usefulness of the lexicon in improving low-resource language translation and sentiment analysis, laying the foundation for scalable AI solutions in linguistically diverse contexts.

Version published to 10.21203/rs.3.rs-6864363/v1 on Research Square
Jun 11, 2025

Advancing Sentiment Analysis in Gujarati: Performance Enhancement through a Hybrid Annotation Framework

This article has 2 authors:
1. Neha Shah¹
2. Preeti Baser²
This article has no evaluationsLatest version Jan 6, 2026
An Evaluation Framework for Dialectal Sentiment Classification and Linguistic Phenomena in Large Language Models

This article has 5 authors:
1. Tarek Rashed
2. Ramadan Alfared
3. Abduelbaset Goweder
4. Husien Alhammi
5. Abubaker Kashada
This article has no evaluationsLatest version Dec 24, 2025
Vectorization and Sentiment Analysis of Arabizi Text

This article has 4 authors:
1. noha youssef
2. Sama Gouda
3. Farida Madkour
4. Mona Ibrahim
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Advancing Sentiment Analysis in Gujarati: Performance Enhancement through a Hybrid Annotation Framework

An Evaluation Framework for Dialectal Sentiment Classification and Linguistic Phenomena in Large Language Models

Vectorization and Sentiment Analysis of Arabizi Text