SHAP-Guided Feature Refinement for Robust and Interpretable Malware Detection in Memory Forensics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The advancement in malware poses a major challenge to traditional detection methods, especially those that rely on static analysis and signature-based techniques. This study investigates a memory forensics-driven framework that combines feature engineering, explainability, and adversarial robustness analysis for malware detection. Using the CIC MalMem-2022 dataset, we analyzed 55 memory-resident features and refined them to 13 through mutual information and a novel SHAP-guided feature refinement (SHAP-GFR) process. We evaluated five models, namely Random Forest, XGBoost, Multilayer Perceptron (MLP), 1 Dimensional Convolutional Neural Network (1D CNN), and a CNN–Long Short-Term Memory (CNN-LSTM) hybrid under a 70:15:15 train/validation/test split. All models achieved high accuracy on clean data, with XGBoost reaching 99.98% at minimal latency (4.68 ms). Our SHAP and LIME analyses showed that service and handle-related features were very key in malware identification. Models’ robustness testing under Gaussian noise, Fast Gradient Sign Method (FGSM), and Projected Gradient Descent (PGD) revealed that while tree-based models’ performance decreased substantially, convolutional architectures (1D CNN) maintained strong performance (≈ 3% F1 drop). Leave-One-Family-Out (LOFO) validation demonstrated strong cross-family generalization. Our findings showed that tree ensembles were better at detecting attacks from entirely new malware families, while CNNs proved more effective against adversarial obfuscation. This work, therefore, establishes a new framework for evaluating malware detection systems. We provide a more comprehensive set of guidelines by focusing on efficiency, interpretability, and robustness for building systems that are not only accurate but also lightweight and resilient enough for real-world deployment..

Article activity feed