A Comparative Analysis of Machine Learning Models for URL-Based Phishing Detection
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Phishing attacks pose a significant and ongoing cybersecurity threat, necessitating effective countermeasures. The challenge lies in accurately and automatically detecting malicious URLs, as traditional methods often fall short against evolving attacker techniques. This research addresses the need for improved detection by evaluating machine learning approaches applied to URL analysis. A dataset of labeled phishing and legitimate URLs, characterized by 30 distinct features encompassing lexical, host-based, and content-related attributes, formed the basis of this study. Five machine learning models were trained and comparatively evaluated: Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), XGBoost (XGB), and a Stacking Classifier ensemble. Performance analysis revealed that the XGBoost classifier achieved the highest accuracy, correctly classifying approximately 97.4% of URLs in the test set. This study demonstrates the effectiveness of machine learning, particularly XGBoost, for high-accuracy phishing URL detection using comprehensive feature sets and contributes a functional prototype system demonstrating the approach.