Dual-Input Fusion Deep Learning Framework for URL-Based Phishing Detection
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
URLs designed to deceive users during phishing attacks are a concerning threat to individual users and organizations. In this work we present a dual-input fusion deep learning framework that extracts character embeddings from URLs and uses lexical and host-based features. We created a dataset consisting of 2 million URLs collected from the Tranco and PhishTank repositories. We outlined and investigated three prototypes of dual-input model architectures (CNN, LSTM, and GRU) for comparison against a baseline deep neural network that used only engineered features. The results show the dual-input CNN achieved 99.91% accuracy, 99.98% precision, 99.84% recall, and 99.91% F1 score. The dual-input fusion maintains local patterns and sequential correlations and will provide a path forward for real-time phishing detection in a changing threat landscape.