Sentiment Detection in Low-Resource Language Gujarati: Evaluating Machine Translation Pipelines for Cross-Lingual Preservation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Cross-lingual applications in low-resource languages have greatly benefited from machine translation (MT); yet, emotion polarity preservation is still extensively researched. Although surface-level lexical similarity is captured by traditional assessment metrics like BLEU and CHRF, the reliability of sentiment transfer remains unclear for applications involving social media, news, and reviews. This paper introduces the first lexicon-anchored benchmark of 5,000 Gujarati news headlines with sentiment labels, establishing a resource for both sentiment analysis and MT evaluation. This study evaluates sentiment preservation in a low-resource Indian language using a lexicon-anchored benchmark of 5,000 sentiment-labeled Gujarati news headlines. (i) XLM-R direct multilingual modeling, (ii) Gujarati-Hindi translation followed by VADER sentiment analysis, and (iii) Gujarati-Hindi translation followed by a transformer-based sentiment classifier are the three pipelines that we appraise. The findings show that the transformer technique based on Gujarati–Hindi translation attains the highest sentiment preservation rate (35.25%), while straight multilingual modelling attains the lowest (27.35%). The usefulness of hybrid evaluation frameworks is further demonstrated by mistake analysis, which identifies instances in which a Gujarati sentiment lexicon effectively restores polarity lost during translation. Our findings indicate that linguistically proximate pivot languages, like Hindi for Gujarati, can improve cross-lingual sentiment fidelity and establish sentiment preservation as an additional evaluation factor for MT in low-resource scenarios.

Article activity feed