Detecting malicious websites using machine learning models by incorporating both lexical and network-based features.
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The utilization of blacklists is a commonly used approach for detecting malicious websites. However, blacklists have limitations as they lack comprehensive information and cannot be easily updated to include newly discovered harmful websites. To enhance security and reduce vulnerability to these attacks, it is crucial to employ techniques that can automatically identify and manage newly emerging malicious websites. In this regard, machine learning models offer a promising solution. By utilizing eight different machine learning models, namely Random Forests (RF), Decision Trees (DT), Logistic Regression (LR), Naive Bayes (NB), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), XGBoost, and LightGBM, it is possible to detect and classify malicious websites effectively. These models leverage the power of machine learning algorithms to analyze various features and patterns associated with malicious URLs, enabling accurate identification and proactive defense against such threats. Additionally, it investigates the application of ensemble methods, particularly the Stacking method, to create a brand-new model known as DKN. The study explores the experimental assessment, including the dataset source, feature extraction, and evaluation measures, and presents the architecture of the DKN model. The outcomes show how well the suggested models and the ensemble DKN stacking model predict the characteristics of URLs. The paper looks at methods like downsampling and oversampling to enhance model performance as well as the problem of imbalanced datasets. By investigating the fusion of several variables and machine-learning models to produce precise predictions, the research makes a contribution to the field of malicious website identification.