Assessing Transformers and Traditional Models for Spanish-English Code-Switched Hate Detection
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Hate speech detection research has mainly focused on monolingual contexts, with limited exploration of multilingual and code-switched languages that introduce distinct linguistic complexities. This study examines hate speech detection in code-switched Spanglish content from social media, comparing transformer-based models—XLM-RoBERTa, DistilBERT, Multilingual BERT, and mT5—with traditional machine learning approaches, including support vector machines, logistic regression, and multinomial naïve Bayes using TF-IDF features. The results indicate that XLM-RoBERTa achieves the highest performance, with 96.14 percent accuracy, 96.16 percent precision, 96.14 percent recall, and a 96.12 percent F1-score, demonstrating its superiority in detecting code-switched hate speech. Although traditional models, particularly SVM (94.03 percent accuracy), perform well, transformer-based approaches offer clear advantages in multilingual contexts.