Sentiment Detection in Low-Resource Language Gujarati: Evaluating Machine Translation Pipelines for Cross-Lingual Preservation

Neha Shah¹
Preeti Baser²
Niraj Shah
Parag Sanghani

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Cross-lingual applications in low-resource languages have greatly benefited from machine translation (MT); yet, emotion polarity preservation is still extensively researched. Although surface-level lexical similarity is captured by traditional assessment metrics like BLEU and CHRF, the reliability of sentiment transfer remains unclear for applications involving social media, news, and reviews. This paper introduces the first lexicon-anchored benchmark of 5,000 Gujarati news headlines with sentiment labels, establishing a resource for both sentiment analysis and MT evaluation. This study evaluates sentiment preservation in a low-resource Indian language using a lexicon-anchored benchmark of 5,000 sentiment-labeled Gujarati news headlines. (i) XLM-R direct multilingual modeling, (ii) Gujarati-Hindi translation followed by VADER sentiment analysis, and (iii) Gujarati-Hindi translation followed by a transformer-based sentiment classifier are the three pipelines that we appraise. The findings show that the transformer technique based on Gujarati–Hindi translation attains the highest sentiment preservation rate (35.25%), while straight multilingual modelling attains the lowest (27.35%). The usefulness of hybrid evaluation frameworks is further demonstrated by mistake analysis, which identifies instances in which a Gujarati sentiment lexicon effectively restores polarity lost during translation. Our findings indicate that linguistically proximate pivot languages, like Hindi for Gujarati, can improve cross-lingual sentiment fidelity and establish sentiment preservation as an additional evaluation factor for MT in low-resource scenarios.

Version published to 10.21203/rs.3.rs-7438637/v1 on Research Square
Oct 29, 2025

Emotion Preservation in Romanian-English Automatic Translations

This article has 4 authors:
1. Alexandra Ciobotaru
2. Ana-Maria Bucur
3. Liviu P. Dinu
4. Daniel Bălăceanu
This article has no evaluationsLatest version Oct 7, 2025
Enhancing Sentiment Analysis with Term Sentiment Entropy: Capturing Nuanced Sentiment in Text Classification

This article has 3 authors:
1. Suttipong Klongdee
2. Manit Singthongchai
3. Jatsada Singthongchai
This article has no evaluationsLatest version Oct 27, 2025
A Hybrid Machine Translation Framework for Low-Resource Indian Languages Using Differential Programming Loss Optimization

This article has 4 authors:
1. Rituraj Dixit
2. Sarabjeet Singh Bedi
3. Ibrahim Aljubayri
4. Mohammad Zubair Khan
This article has no evaluationsLatest version Oct 1, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Emotion Preservation in Romanian-English Automatic Translations

Enhancing Sentiment Analysis with Term Sentiment Entropy: Capturing Nuanced Sentiment in Text Classification

A Hybrid Machine Translation Framework for Low-Resource Indian Languages Using Differential Programming Loss Optimization