Emotion Preservation in Romanian-English Automatic Translations
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
When dealing with low-resource languages in Natural Language Processing (NLP), researchers sometimes rely on automatic translation into English in order to take advantage of the plethora of language processing methods available for the English language. After processing the English text, the information obtained is transferred back to the source language. This approach is commonly used for automatic emotion classification as well. However, one of the major concerns about this technique is that the meaning of the translated texts and the emotions they convey may become distorted. We provide the first study on Romanian-English translations and examine if, and how, emotions change due to automatic translation. We achieve this by fine-tuning transformer-based models on the Romanian Emotion Detection Dataset v2 (REDv2), which contains Romanian multi-label tweets tagged for 7 emotions. We repeat this process on two automatically translated English versions of REDv2, and two additional versions of the translated datasets featuring their re-annotated test, making them the first Romanian-English datasets for emotion distortion. We create feature-engineered models that incorporate translation quality indicators from each REDv2 translated text, using machine translation quality estimation data provided by TransQuest. We also perform emotion detection using generative models on the REDv2 test set and its translated versions. Lastly, we conduct quantitative and qualitative statistical analyses to draw our final conclusions.