Emergent Threat Discovery: Unsupervised Machine Learning for Phishing Campaign Analysis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Phishing, a persistent cyber-attack, continues to cause data breaches affecting both individuals and corporations. Web-based phishing, often spread through social media posts and emails, remains a significant threat due to the risks posed by phishing URLs. Traditional detection methods, primarily relying on supervised learning, require extensive datasets and computational power, compromising user privacy. These methods also struggle to keep pace with evolving threats, especially those generated using advanced AI techniques. Unsupervised clustering methods have been explored, but challenges related to scalability and detection accuracy persist. In this extended work, we present a more robust phishing detection framework that utilizes large language models (LLMs) to create a refined embedding space, significantly enhancing cluster cohesion and silhouette scores. A key contribution of this work is the push towards Explainable AI (XAI), which provides in-depth visualizations of phishing clusters, including scatter plots and keyword-based cluster explanations. These visualizations enable a clearer understanding of the clustering process, revealing insights into campaign patterns, such as those targeting specific sectors (e.g., finance). In relation to that, we also introduce enhanced campaign detection visualizations, offering a more detailed characterization of phishing campaigns, including their evolution and sector-specific targeting. Our system remains fast, scalable, and privacy-preserving, providing a comprehensive solution for real-time phishing campaign detection while enhancing interpretability through XAI.