Sentiment Analysis of Imbalanced Dataset through Data Augmentation and Generative Annotation using DistilBERT and Low-Rank Fine-Tuning

Hossein Nekkouei Nasrabadi
Mohammad Hossein Moattar

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper proposes a novel approach to sentiment analysis of imbalanced datasets, focusing on data augmentation and efficient fine-tuning. We address the challenge of limited minority class representation by leveraging GPT-4 to generate synthetic tweets via paraphrasing and back- translation (using Italian as an intermediary language). Furthermore, the main contribution is that we utilize GPT-4 to annotate tweets with positive reasons, derived by inverting the ten predefined negative categories within the dataset. The augmented dataset trains a DistilBERT model for sentence embeddings, and Low-Rank Adaptation (LoRA) enables efficient fine-tuning. A SoftMax layer provides classification into positive, neutral, and negative sentiments. Experiments on the Twitter US Airline Sentiment dataset demonstrate our approach’s efficacy, achieving 100% accuracy with minimal training time, highlighting the importance of data augmentation and efficient fine-tuning for robust sentiment analysis, particularly with imbalanced datasets.

Version published to 10.21203/rs.3.rs-5879286/v1 on Research Square
Jan 28, 2025

Advancing Sentiment Analysis in Gujarati: Performance Enhancement through a Hybrid Annotation Framework

This article has 2 authors:
1. Neha Shah¹
2. Preeti Baser²
This article has no evaluationsLatest version Jan 6, 2026
Integrating Explainability for Sentiment Interpretation, Misclassification, and Bias Detection in Women-in-STEM Social Media

This article has 2 authors:
1. Shereen Fouad
2. Ezzaldin Alkooheji
This article has no evaluationsLatest version Jan 12, 2026
CLARA: Enhancing Multimodal Sentiment Analysis via Efficient Vision-Language Fusion

This article has 3 authors:
1. Phuong Lam
2. Phan Thi Tuoi
3. Thien Khai Tran
This article has no evaluationsLatest version Jan 7, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Advancing Sentiment Analysis in Gujarati: Performance Enhancement through a Hybrid Annotation Framework

Integrating Explainability for Sentiment Interpretation, Misclassification, and Bias Detection in Women-in-STEM Social Media

CLARA: Enhancing Multimodal Sentiment Analysis via Efficient Vision-Language Fusion