Multi-Label Machine Learning Models for Trolling and Cyberbullying Prediction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
With the popularity of social media, online trolling and cyberbullying remain persistent and well-spread challenges to netizens, with considerable harms to human well-being and community cohesion. This study develops a multi-label machine learning framework for automated prediction of trolling and bullying in the cyberspace using a publicly available Kaggle dataset (cyberbullying_dataset_CSV_version.csv) comprising 280,050 comments annotated with 81 nonmutuallyexclusive categories. The corpus is genuinely multi-labelled (mean 1.99 labels per comment), with the most frequent categories including religious hate, political hate, other cyberbullying types, ethnic hate, threats, and trolling. Classical (LogReg, SVM, RF), sequence (BiLSTM), and transformer-based (BERT) models are benchmarked under a consistent pipeline that (i) learns per-label decision thresholds on a validation split, and (ii) evaluates with metrics suited to multi-label settings (Micro/Macro-F1, Hamming Loss, Subset Accuracy). Experimental results show that fine-tuned BERT achieves the strongest overall performance, improving over BiLSTM by +0.06 Micro-F1 (from 0.81 to 0.87), +0.08 Macro-F1 (0.70 to 0.78), and +0.09 Subset Accuracy (0.52 to 0.61), while reducing Hamming Loss by-0.015 (0.067 to 0.052). We provide model architectures, implementation details, and rich visual diagnostics (grouped bars with uncertainty, multi-metric radar, label-wise heatmap), and we discuss thresholding, calibration, and fairness. The results obtained and practices support reproducible research and reliable deployment of multi-label moderation systems trained on Kaggle-scale data.