Sentiment Analysis on Code Mixed Telugu-english Text Using Advanced Approaches

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In our increasingly interconnected world, The extensive usage of code-mixed languages, which reflects the varied linguistic backgrounds of users, is a defining feature of the linguistic landscape. Sentiment analysis on Code-Mixed Telugu-English Text (CMTET) presents unique issues due to the data’s unstructured nature, including informal language, transliterations, and spelling errors. This study aims to experiment on transformer models such as BERT and RoBERTa for sentiment analysis of CMTET to assess their performance. RoBERTa, with its extensive pre-training on large and diverse corpora, demonstrates a notable capability to handle the linguistic intricacies of code-mixed text, where Telugu and English words are intertwined. Its ability to capture context ensures accurate interpretation of sentiment based on surrounding words, despite language mixing. RoBERTa’s Byte-Pair Encoding (BPE) tokenization adeptly manages out-of-vocabulary words and transliterations common in code-mixed text. The proposed methodology achieves an accuracy of 81% for sentiment analysis of code-mixed Telugu English text using the Roberta model.

Article activity feed