Comparative Analysis of Supervised and Unsupervised Learning for Intrusion Detection in Network Logs
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The escalating complexity of network infrastructures and the increasing sophistication of cyber threats require increasingly robust and automated Intrusion Detection Systems (IDS). This article presents a comparative investigation of the effectiveness of various Machine Learning and Deep Learning architectures in detecting network anomalies in network logs. The methodology encompassed classic supervised and ensemble algorithms, such as Random Forest and XGBoost, to sequential Deep Learning approaches (LSTM, GRU) and unsupervised models based on latent reconstruction (VAE, DeepLog). The results demonstrate that supervised approaches significantly outperformed unsupervised methods in the analyzed context. The optimized XGBoost model established a performance benchmark, achieving a Recall of 0.96 and a Precision of 0.85, thereby offering an optimal balance between detecting rare threats and minimizing false alarms. In contrast, unsupervised models revealed critical limitations, suggesting that statistical mimicry between normal and anomalous traffic hinders detection based solely on reconstruction error. Additionally, the study documents the technical interoperability challenges when attempting to integrate state-of-the-art language models, such as BERT. In conclusion, this work validates the effectiveness of Gradient Boosting algorithms and recurrent networks as viable and scalable solutions for critical network security, providing guidelines for model selection in real monitoring environments.