Machine Learning for Lateral Movement Detection using Sysmon Logs: An Empirical Comparison of Imbalanced and Resampled Data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Lateral Movement (LM) represents a growing threat, frequently employed by advanced persistent threat groups to escalate privileges and navigate systems towards high-value assets. Recognizing the limitations of existing literature, this work leverages the unique LM-focused LMD-2023 imbalanced benchmark dataset, which comprises Microsoft Windows Sysmon logs, to provide a multifaceted novel contribution to the LM Intrusion Detection System (IDS) domain. We investigate the impact of various open-source over-undersampling balancing techniques on the performance of LM IDS frameworks. Specifically, we address the research question: How does the sample distribution within a benchmark dataset affect the performance evaluation metrics of LM-oriented IDS models, either shallow or DNN? To this end, we adopt a multiclass supervised approach, classifying network activity into Normal, Exploitation of Remote Services, and Exploitation of Hashing Techniques. We scrutinize the effect of oversampling, undersampling, and hybrid-sampling techniques across 13 machine learning algorithms, nine shallow and four deep neural network techniques, using the LMD-2023 corpus. Our key findings reveal that balanced versions of the dataset generally improved performance. Shallow models trained on resampled data achieved a marginal convergence of approximately +0.05% in AUC and F1-score compared to the imbalanced scenario. Notably, DNN models exhibited a more substantial performance gain of around 3.5% across most balancing techniques. Furthermore, analysis of False Positive Rate (FPR) and False Negative Rate (FNR) revealed crucial trade-offs. Notably, while some balanced datasets led to near-zero FNR with ensemble methods like Bagging, others, particularly with DNNs and techniques like ADASYN, showed a higher propensity for false alarms. These observations underscore the critical role of data balancing in optimizing LM IDS performance and highlight the varying impact of different techniques on the FPR/FNR trade-off for shallow versus deep learning models.

Article activity feed