Emergent Threat Discovery: Unsupervised Machine Learning for Phishing Campaign Analysis

Muhammad Fahad Zia
Sri Harish Kalidass

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Phishing, a persistent cyber-attack, continues to cause data breaches affecting both individuals and corporations. Web-based phishing, often spread through social media posts and emails, remains a significant threat due to the risks posed by phishing URLs. Traditional detection methods, primarily relying on supervised learning, require extensive datasets and computational power, compromising user privacy. These methods also struggle to keep pace with evolving threats, especially those generated using advanced AI techniques. Unsupervised clustering methods have been explored, but challenges related to scalability and detection accuracy persist. In this extended work, we present a more robust phishing detection framework that utilizes large language models (LLMs) to create a refined embedding space, significantly enhancing cluster cohesion and silhouette scores. A key contribution of this work is the push towards Explainable AI (XAI), which provides in-depth visualizations of phishing clusters, including scatter plots and keyword-based cluster explanations. These visualizations enable a clearer understanding of the clustering process, revealing insights into campaign patterns, such as those targeting specific sectors (e.g., finance). In relation to that, we also introduce enhanced campaign detection visualizations, offering a more detailed characterization of phishing campaigns, including their evolution and sector-specific targeting. Our system remains fast, scalable, and privacy-preserving, providing a comprehensive solution for real-time phishing campaign detection while enhancing interpretability through XAI.

Version published to 10.21203/rs.3.rs-6718753/v1 on Research Square
May 23, 2025

Machine Learning for Lateral Movement Detection using Sysmon Logs: An Empirical Comparison of Imbalanced and Resampled Data

This article has 2 authors:
1. Christos Smiliotopoulos
2. Georgios Kambourakis
This article has no evaluationsLatest version Jun 6, 2025
Hook, Line, and Sinker: AI-Powered Phishing Defense of Digital Communications

This article has 4 authors:
1. Harsh Rathod
2. Pooja Purohit
3. Rishika Singh
4. Niki Modi
This article has no evaluationsLatest version Jun 19, 2025
Machine Learning for Privacy Threat Classification: A Systematic Review

This article has 4 authors:
1. L.D.C.S. Subhashini
2. Yuefeng Li
3. Xiaohui Tao
4. Jianming Yong
This article has no evaluationsLatest version Jul 2, 2025

Listed in

Abstract

Article activity feed

Related articles

Machine Learning for Lateral Movement Detection using Sysmon Logs: An Empirical Comparison of Imbalanced and Resampled Data

Hook, Line, and Sinker: AI-Powered Phishing Defense of Digital Communications

Machine Learning for Privacy Threat Classification: A Systematic Review