Multi-Label Machine Learning Models for Trolling and Cyberbullying Prediction

Adenrele A. Afolorunso
Oluwasogo A. Okunade
Morufu Olalere
Adeyinka O. Abiodun
Olawale Surajudeen Adebayo

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

With the popularity of social media, online trolling and cyberbullying remain persistent and well-spread challenges to netizens, with considerable harms to human well-being and community cohesion. This study develops a multi-label machine learning framework for automated prediction of trolling and bullying in the cyberspace using a publicly available Kaggle dataset (cyberbullying_dataset_CSV_version.csv) comprising 280,050 comments annotated with 81 nonmutuallyexclusive categories. The corpus is genuinely multi-labelled (mean 1.99 labels per comment), with the most frequent categories including religious hate, political hate, other cyberbullying types, ethnic hate, threats, and trolling. Classical (LogReg, SVM, RF), sequence (BiLSTM), and transformer-based (BERT) models are benchmarked under a consistent pipeline that (i) learns per-label decision thresholds on a validation split, and (ii) evaluates with metrics suited to multi-label settings (Micro/Macro-F1, Hamming Loss, Subset Accuracy). Experimental results show that fine-tuned BERT achieves the strongest overall performance, improving over BiLSTM by +0.06 Micro-F1 (from 0.81 to 0.87), +0.08 Macro-F1 (0.70 to 0.78), and +0.09 Subset Accuracy (0.52 to 0.61), while reducing Hamming Loss by-0.015 (0.067 to 0.052). We provide model architectures, implementation details, and rich visual diagnostics (grouped bars with uncertainty, multi-metric radar, label-wise heatmap), and we discuss thresholding, calibration, and fairness. The results obtained and practices support reproducible research and reliable deployment of multi-label moderation systems trained on Kaggle-scale data.

Version published to 10.21203/rs.3.rs-7622077/v1 on Research Square
Oct 29, 2025

Fine-grained Insider Threat Detection with Large Language Models: A Comparative Study

This article has 4 authors:
1. Parvin Ahmadi Doval Amiri
2. Alexis Brissard
3. Frédéric Cuppens
4. Amal Zouaq
This article has no evaluationsLatest version Sep 23, 2025
An Enhanced Machine Learning with NLP Modelling Technique for Smishing Attacks Detection in Low-Resourced Languages

This article has 2 authors:
1. Aaron Zimba
2. Katongo Ongani Phiri
This article has no evaluationsLatest version Oct 6, 2025
A Comparative Analysis of Deep Learning and Machine Learning Approaches for Spam Identification on Telegram

This article has 7 authors:
1. Shuo Xu
2. Zhanyi Ding
3. Zijing Wei
4. Chao Yang
5. Yixiang Li
6. Xuanjie Chen
7. Hailiang Wang
This article has no evaluationsLatest version Oct 28, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Fine-grained Insider Threat Detection with Large Language Models: A Comparative Study

An Enhanced Machine Learning with NLP Modelling Technique for Smishing Attacks Detection in Low-Resourced Languages

A Comparative Analysis of Deep Learning and Machine Learning Approaches for Spam Identification on Telegram