Sentiment Analysis of Social Media Data for Airline Brand Reputation Management Using Machine Learning Techniques in Python

Neha Singh Rajput

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Social media platforms, particularly Twitter, have become vital channels where airline companies encounter vast volumes of customer feedback daily. This abundance of user- generated content presents significant opportunities for sentiment analysis applications. Previous studies have demonstrated the potential of machine learning and natural language processing approaches to identify meaningful patterns within customer opinions, although several obstacles persist, including the detection of sarcastic content, handling uneven data distributions, and interpreting ambiguous expressions. This research aimed to determine the feasibility of using Twitter conversations about airlines to accurately classify customer sentiment through computational methods. Our analysis utilized a corpus of 14,640 manually annotated tweets targeting six major U.S. airline carriers, with each message categorized into positive, negative, or neutral sentiment classes. Following text preprocessing procedures and feature extraction using Term Frequency- Inverse Document Frequency (TF-IDF) vectorization, we developed and assessed three distinct classification algorithms: Logistic Regression, Random Forest, and XGBoost models. Our experimental results revealed that XGBoost achieved superior classification accuracy compared to the other approaches, although certain misclassification patterns emerged, particularly in distinguishing between neutral and positive sentiment expressions. This study uses the methods like machine learning for knowing the sentiment level of customer which is provided in airline-based tweets. Twitter US Airline Sentiment Dataset is used which has 14,640 rows. The dataset has totally six U.S. carriers. The data will be preprocessed then cleaned, tokenization, elimination of stopword, lemmatization and feature engineering using TF-IDF vectorization are performed. Labels for airline and complaint reasons are the categorical features will be converted into certain formats for performing the computational modeling. The study joins the preprocessing of the data, feature engineering and learning has enhanced the classification of the automated sentiment. The findings allow to track the feedback collected from customer feedback, enhance the quality of the service and allows to make decision in the airline industry. The sentiment is classified into negative or neutral or positive and they are predicted using machine learning algorithms. Some of the machine learning algorithms are Logistic Regression, Random Forest, and XGBoost. 70:30 split is used in the dataset for partitioning into training and testing subsets. It helps to manage the proportions for the class. Model performance is used for finding many measures like recall, accuracy, precision, F1-score, confusion matrices and ROC curves. The output showed that XGBoost have performed well compared to other models and gradient boosting is used for managing the patterns which are based on text. It provides the importance of preprocessing of the data. Topic modeling of negative tweets showed the main reason for dissatisfaction like delays and problems in customer service and shares the insights for managing the airline. The tweets which are misclassified has many issues related to sarcasm, mixed sentiment expressions and usage of informal language.

Version published to 10.20944/preprints202510.0782.v1
Oct 10, 2025

State-of-the-Art Machine Learning Techniques in Sentiment Analysis for Social Media

This article has 3 authors:
1. Mohsen Mohammadagha
2. Israel Tshitenge
3. Ifetilayo Adebambo
This article has no evaluationsLatest version Aug 27, 2025
A Systematic Review of Sentiment Analysis Systems Applied to Textual Data

This article has 2 authors:
1. Phuong Dao Quoc
2. Vuong M. Ngo
This article has no evaluationsLatest version Oct 3, 2025
Brand Hate Detector: An R Shiny Application for Automated Detection and Multilevel Classification of Brand Hate in Consumer Reviews

This article has 1 author:
1. Mohamed Assoud
This article has no evaluationsLatest version Aug 18, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

State-of-the-Art Machine Learning Techniques in Sentiment Analysis for Social Media

A Systematic Review of Sentiment Analysis Systems Applied to Textual Data

Brand Hate Detector: An R Shiny Application for Automated Detection and Multilevel Classification of Brand Hate in Consumer Reviews