GAN-based Synthetic Data Generation for Minority Intrusion Classes in IoT Datasets
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The proliferation of Internet of Things (IoT) devices has heightened the need for robust Intrusion Detection Systems (IDS) capable of identifying a wide spectrum of cyber threats. However, a persistent challenge in IoT intrusion detection is the significant class imbalance in publicly available datasets, where minority intrusion classes—such as User-to-Root (U2R) and Remote-to-Local (R2L) attacks—are severely underrepresented. This imbalance leads to poor detection performance for rare but critical attack types. In this study, we propose a Generative Adversarial Network (GAN)-based framework for generating synthetic intrusion samples specifically targeting these minority classes. Our approach involves training class-conditional GANs to learn the data distribution of underrepresented attacks and generate high-fidelity synthetic samples, which are then used to augment the training set of conventional classifiers. We conduct extensive experiments using benchmark IoT intrusion datasets, including Bot-IoT and CICIDS2017, and evaluate the impact of GAN-based augmentation on multiple machine learning classifiers. The results demonstrate that incorporating GAN-generated samples significantly improves classification metrics—particularly recall and F1-score—for minority classes, without degrading overall system performance. Compared to traditional oversampling methods like SMOTE, our GAN-based approach achieves more realistic sample generation and better generalization. This research highlights the potential of deep generative models to address data imbalance in cybersecurity applications, offering a promising direction for enhancing the accuracy and reliability of IDS in IoT environments.