Enhanced Phishing Detection Using Binary Encoding XGBoost and LSTM Feature Extraction and Capsule Network Classification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Phishing attacks are a critical threat to the security of the online world. Detection methods traditionally cannot keep pace with the ever-evolving tactics of the attackers. The research suggests a sophisticated phishing detection system based on Capsule Networks (CapsNet) and incorporating XGBoost and LSTM for feature extraction, while Gradient Boosting is used for feature selection. The system uses email data and hence captures critical features such as sender information, subject lines, email bodies and URLs. This data is thus using binary encoding for preprocessing the model. To evaluate the model, various performance metrics are calculated, including Accuracy, Precision, Recall and F1-score. The CapsNet model classifies with an accuracy of 99.62%, precision of 99.53%, a recall of 99.70% and an F1-score of 99.62%. It has outperformed other current phishing detection methods like PDMLP, AdaBoost and Naive Bayes (NB), especially in sensitivity and overall classification performance. Additionally, the low FPR (0.0173) and FNR (0.022889) of the model further increase its reliability for real-world phishing detection. The proposed hybrid system looks very promising in the fight against advanced phishing attacks as it can detect phishing websites and emails remarkably well.