A Hybrid Deep Ensemble Framework with Interpretability for Phishing URL Detection
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Phishing URL detection is a critical security challenge in modern cyberspace. Traditional methods rely heavily on handcrafted statistical features, while deep learning approaches leverage sequential URL character representations. This study proposes a novel hybrid ensemble framework that integrates character-level deep neural networks with behavioral statistical modeling, combined via dynamic AUC-this weighted stacking. Specifically, this study employs a window-based sparse attention mechanism over bidirectional LSTM (BiLSTM) representations, further enhanced with convolutional neural modules to extract both global and local textual patterns. Meanwhile, handcrafted features based on URL, domain, and page structure statistics are modeled via gradient boosting machines. A dual-level stacking ensemble, combining both dynamic soft-this weighting and logistic regression meta-learning, is utilized for final decision-making. Extensive experiments on a large-scale public dataset demonstrate this method achieves competitive performance, while attention visualization and feature attribution offer enhanced interpretability for practical deployment.