An Enhanced Machine Learning with NLP Modelling Technique for Smishing Attacks Detection in Low-Resourced Languages

Aaron Zimba
Katongo Ongani Phiri

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Smishing, a form of phishing through SMS, has emerged as a significant cybersecurity threat, particularly on mobile money platforms in regions with limited cybersecurity awareness. This research introduces a robust machine learning model integrated with advanced natural language processing (NLP) techniques for effective smishing detection. The proposed model targets English and Bemba, a low-resourced language, addressing a critical gap in cybersecurity research for linguistically diverse, resource-constrained environments. The model incorporates pseudonymization to enhance data security by anonymizing sensitive information such as personal identifiers while retaining the contextual integrity of messages. Named Entity Recognition (NER) is employed to detect and mask sensitive entities, further safeguarding user privacy. To bolster model robustness against adversarial attacks, adversarial training is applied, exposing the model to perturbed inputs during training to improve its resilience to manipulation. Regularization techniques, specifically L1 regularization, are used to optimize the model by reducing overfitting and ensuring efficient performance. The evaluation utilized datasets in English, Bemba, and a combination of both to assess the model’s adaptability to multilingual inputs. The results demonstrate superior performance, with high F1-Scores, low log loss, and across datasets, AUC ranged from 0.93 (Bemba) to 0.98 (English–Bemba), with consistently strong F1 and MCC. These metrics underscore the model’s capability to distinguish between smishing and legitimate messages effectively. By combining machine learning and NLP in a privacy-preserving and security-enhanced framework, this research provides a scalable, efficient solution for smishing detection in under-resourced contexts, contributing significantly to advancements in cybersecurity for low-resourced languages.

Version published to 10.21203/rs.3.rs-7521286/v1 on Research Square
Oct 6, 2025

Fine-grained Insider Threat Detection with Large Language Models: A Comparative Study

This article has 4 authors:
1. Parvin Ahmadi Doval Amiri
2. Alexis Brissard
3. Frédéric Cuppens
4. Amal Zouaq
This article has no evaluationsLatest version Sep 23, 2025
Integrating Machine Learning and Artificial Intelligence for Next-Generation Cybersecurity in Computer Science Applications

This article has 1 author:
1. Naveed Akhtar
This article has no evaluationsLatest version Sep 25, 2025
Multi-Label Machine Learning Models for Trolling and Cyberbullying Prediction

This article has 5 authors:
1. Adenrele A. Afolorunso
2. Oluwasogo A. Okunade
3. Morufu Olalere
4. Adeyinka O. Abiodun
5. Olawale Surajudeen Adebayo
This article has no evaluationsLatest version Oct 29, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Fine-grained Insider Threat Detection with Large Language Models: A Comparative Study

Integrating Machine Learning and Artificial Intelligence for Next-Generation Cybersecurity in Computer Science Applications

Multi-Label Machine Learning Models for Trolling and Cyberbullying Prediction