Sentiment Analysis of Imbalanced Dataset through Data Augmentation and Generative Annotation using DistilBERT and Low-Rank Fine-Tuning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper proposes a novel approach to sentiment analysis of imbalanced datasets, focusing on data augmentation and efficient fine-tuning. We address the challenge of limited minority class representation by leveraging GPT-4 to generate synthetic tweets via paraphrasing and back- translation (using Italian as an intermediary language). Furthermore, the main contribution is that we utilize GPT-4 to annotate tweets with positive reasons, derived by inverting the ten predefined negative categories within the dataset. The augmented dataset trains a DistilBERT model for sentence embeddings, and Low-Rank Adaptation (LoRA) enables efficient fine-tuning. A SoftMax layer provides classification into positive, neutral, and negative sentiments. Experiments on the Twitter US Airline Sentiment dataset demonstrate our approach’s efficacy, achieving 100% accuracy with minimal training time, highlighting the importance of data augmentation and efficient fine-tuning for robust sentiment analysis, particularly with imbalanced datasets.

Article activity feed