Comparative Evaluation of Machine Learning Models with Different Data Balancing Techniques for DDoS Attack Detection

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study investigates and compares the efficacy of Machine Learning models for the detection of TCP SYN-based Distributed Denial of Service (DDoS) attacks, utilizing the CIC-DDoS 2019 and CIC-IoT 2023 datasets. To address the inherent data imbalance in the dataset, several balancing techniques, such as Synthetic Minority Oversampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), Synthetic Minority Oversampling Technique (SMOTE) with Tomek Links (SMOTE-TomekLink), and Synthetic Minority Oversampling Technique (SMOTE) and Edited Nearest Neighbors (SMOTE-ENN), have been applied to enhance model performance. For Machine learning, Random Forest, Naive Bayes, and Logistic Regression models were evaluated using various metrics such as accuracy, precision, recall, F1 score, balanced accuracy, ROC-AUC, and detection time. The detection performance of each model is optimized by varying the classification cutoff threshold. Furthermore, unlike other works in the literature, we utilized two different datasets for training and testing to show the robustness of our machine learning models. For training, we utilized the CIC-DDoS-2019 dataset, and to test, we used the CIC-IoT-2023 dataset. This study highlights the critical role of data balancing in improving detection capabilities. It was observed that Logistic Regression with the balancing techniques SMOTE consistently demonstrated superior performance compared with tree-based models and probabilistic models. Our tuning of cutoff values for optimization of these models revealed the trade-offs inherent in precision-recall dynamics and further improved the models' performance. Moreover, our study in this paper offers practical insights into enhancing the performance of intrusion detection systems by integrating balancing techniques and optimizing thresholds, thus paving the way for more robust cybersecurity frameworks.

Article activity feed