Explainable and Adversarial Robust Deep Learning for Malware Campaigns Forensic Attribution

Idowu Olugbenga ADEWUMI
Wumi AJAYI
Tolulope OLUFEMI
Ayoade Oluwafisayo BABATOPE

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This research paper proposes a machine learning framework for malware attribution in digital forensics based on Random Forest. The model takes malware samples as input and classifies them into attribution labels such as APT1, APT28, APT33, CyberGangX. These features include File_Size_KB, Num_Functions, Num_Imports, and Entropy. The model was trained on a dataset consisting of 5000 samples. The important features have a wide range of values. The feature File_Size_KB varies from 11 KB to 4998 KB. Likewise, entropy varies from 1.5 to 8. The model's accuracy was found to be 19.7% with precision, recall and F1 scores at 20% on average overall attribution. According to the feature importance plot of the Random Forest model, the key features were File_Size_KB, Num_Functions, and Entropy. In terms of methodology, the model was trained using the normal method, whereby 80% of the dataset was used for training and 20% for testing. The features were normalized using StandardScaler and labels were converted from categorical to numerical values using LabelEncoder. A decline in performance was observed during adversarial robustness testing, with F1-score dropping from 20% on clean data to 15% on adversarially perturbed data. The model has difficulty due to class imbalance causing it to misclassify classes mostly represented, like CyberGangX and Unknown. The model produced higher results for some classes (e.g. APT1), it had low precision and recall for many other classes. The authors mention another challenge of robustness; the model can be fooled with a small perturbation. To sum up, while the proposed model can strengthen the existing malware attribution processes, its scalability, performance, and adversarial defenses need improvement. To improve robustness in our future efforts, we should focus on hyperparameter tuning, picking better models and trying the adversarial training. Further, feature engineering and network traffic data inclusion can enhance the model performance by increasing the accuracy and by allowing it to classify more complex malware. The findings of the study indicate that it is very important to develop a good predictive model that can be interpreted. Further, it will help cybersecurity professionals and law enforcement agencies in the field of digital forensics.

Version published to 10.21203/rs.3.rs-8515849/v1 on Research Square
Jan 7, 2026

MAD-OOD: A Deep Learning Cluster-Driven Framework for an Out-of-Distribution Malware Detection and Classification

This article has 6 authors:
1. Tosin Ige
2. Christopher Kiekintveld
3. Aritran Piplai
4. Asif Rahman
5. Olukunle Kolade
6. Sasidhar Kunapuli
This article has no evaluationsLatest version Dec 22, 2025
A Study on Explainable Artificial Intelligence(XAI) in Malware Detection for Proactive Cyber Threat Hunting

This article has 3 authors:
1. Pankaj Gajakosh S.
2. Rama Abirami K.
3. Nagendra Kumar Y. J.
This article has no evaluationsLatest version Dec 23, 2025
Phishing Attack Detection and Secure Data Transfer Using Echo State Networks and Federated Identity Management

This article has 2 authors:
1. Khalil El Hindi
2. Mohammed A. El-Meligy
This article has no evaluationsLatest version Dec 29, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

MAD-OOD: A Deep Learning Cluster-Driven Framework for an Out-of-Distribution Malware Detection and Classification

A Study on Explainable Artificial Intelligence(XAI) in Malware Detection for Proactive Cyber Threat Hunting

Phishing Attack Detection and Secure Data Transfer Using Echo State Networks and Federated Identity Management