Analyzing Multilingual Conversations During COVID-19: An Imbalanced Class-Ensemble Learning Approach with Reweighted AdaBoost-SVM for Code-Switched Text Classification

Samawel Jaballi
Salah Zrigui
Henri Nicolas
Mounir Zrigui

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study confronts the challenge of analyzing multilingual, code-switched conversations during the COVID-19 pandemic, a context where traditional classifiers often fall short. We developed a cost-sensitive ensemble learning approach that combines a reweighted AdaBoost-SVM model with an SVM as its base learner, specifically designed to effectively manage the imbalanced dataset common in code-switched communication scenarios. A key innovation of our approach is the novel rebalancing of AdaBoost weights. By incrementally adjusting the weights of misclassified samples from both minority and majority classes, we achieve a more balanced classification in each iteration. This strategy significantly improves the accuracy for minority class classification, a common issue with existing models. In the testing phase, we employed a comprehensive selection of both machine and deep learning classifiers, including Naive Bayes, Decision Trees, SMOTEBoost, CNN, Bi-LSTM, etc. These classifiers underwent comprehensive evaluation across two different multilingual datasets, assessed using six distinct metrics, including P-mean. The results from our experiments demonstrate that our proposed ensemble learning approach, fine-tuned with optimal hyperparameters and leveraging M-BERT for feature extraction, achieved remarkable accuracies of 78.84%, 86.56% and 83.96% on the test sets of the CTSA, TUNIZI and combined CTSA-TUNIZI datasets, respectively. This performance not only surpassed traditional classification methods but also outperformed advanced deep learning models, such as Bi-LSTM.

Version published to 10.21203/rs.3.rs-3978507/v1 on Research Square
Feb 26, 2024

Bengali Hate Speech Detection from Social Media using Ensemble Machine Learning Approach.

This article has 4 authors:
1. Sadia Tarin
2. Farzina Akther
3. Pranta Paul
4. Tanvinur Rahman Siam
This article has no evaluationsLatest version Apr 28, 2025
Assessing Transformers and Traditional Models for Spanish-English Code-Switched Hate Detection

This article has 2 authors:
1. Steve Nwaiwu
2. Nipat Jongsawat
This article has no evaluationsLatest version Apr 1, 2025
Evaluating the Efficacy of Bayesian Optimization for Class-Imbalanced Data: Jointly Optimizing Classifier Hyperparameters and Sampling Strategies

This article has 1 author:
1. Graham Glasheen
This article has no evaluationsLatest version Apr 15, 2025

Listed in

Abstract

Article activity feed

Related articles

Bengali Hate Speech Detection from Social Media using Ensemble Machine Learning Approach.

Assessing Transformers and Traditional Models for Spanish-English Code-Switched Hate Detection

Evaluating the Efficacy of Bayesian Optimization for Class-Imbalanced Data: Jointly Optimizing Classifier Hyperparameters and Sampling Strategies