Machine Learning for Lateral Movement Detection using Sysmon Logs: An Empirical Comparison of Imbalanced and Resampled Data

Christos Smiliotopoulos
Georgios Kambourakis

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Lateral Movement (LM) represents a growing threat, frequently employed by advanced persistent threat groups to escalate privileges and navigate systems towards high-value assets. Recognizing the limitations of existing literature, this work leverages the unique LM-focused LMD-2023 imbalanced benchmark dataset, which comprises Microsoft Windows Sysmon logs, to provide a multifaceted novel contribution to the LM Intrusion Detection System (IDS) domain. We investigate the impact of various open-source over-undersampling balancing techniques on the performance of LM IDS frameworks. Specifically, we address the research question: How does the sample distribution within a benchmark dataset affect the performance evaluation metrics of LM-oriented IDS models, either shallow or DNN? To this end, we adopt a multiclass supervised approach, classifying network activity into Normal, Exploitation of Remote Services, and Exploitation of Hashing Techniques. We scrutinize the effect of oversampling, undersampling, and hybrid-sampling techniques across 13 machine learning algorithms, nine shallow and four deep neural network techniques, using the LMD-2023 corpus. Our key findings reveal that balanced versions of the dataset generally improved performance. Shallow models trained on resampled data achieved a marginal convergence of approximately +0.05% in AUC and F1-score compared to the imbalanced scenario. Notably, DNN models exhibited a more substantial performance gain of around 3.5% across most balancing techniques. Furthermore, analysis of False Positive Rate (FPR) and False Negative Rate (FNR) revealed crucial trade-offs. Notably, while some balanced datasets led to near-zero FNR with ensemble methods like Bagging, others, particularly with DNNs and techniques like ADASYN, showed a higher propensity for false alarms. These observations underscore the critical role of data balancing in optimizing LM IDS performance and highlight the varying impact of different techniques on the FPR/FNR trade-off for shallow versus deep learning models.

Version published to 10.21203/rs.3.rs-6547346/v1 on Research Square
Jun 6, 2025

A lightweight machine learning approach for DDoS detection and classification

This article has 3 authors:
1. Osama Ebrahem
2. Salah Dowaji
3. Suhel Alhammoud
This article has no evaluationsLatest version Jul 1, 2025
Automated Detection Techniques Utilising a Trained Dataset for Zero-Day Attack Identification

This article has 2 authors:
1. Nana Kwame Gyamfi
2. Nikolaj Goranin
This article has no evaluationsLatest version Jul 2, 2025
AI-Powered Intrusion Detection Using CNN-LSTM for Cloud and Edge Networks: A Hybrid Deep Learning Approach

This article has 1 author:
1. Niki Modi
This article has no evaluationsLatest version Jun 20, 2025

Listed in

Abstract

Article activity feed

Related articles

A lightweight machine learning approach for DDoS detection and classification

Automated Detection Techniques Utilising a Trained Dataset for Zero-Day Attack Identification

AI-Powered Intrusion Detection Using CNN-LSTM for Cloud and Edge Networks: A Hybrid Deep Learning Approach