Enhancing malware detection reliability in non-executable files using confidence score prediction

Rasoul Rezvani-Jalal
Morteza Zakeri
Saeed Parsa
Amin Hasan-Zarei

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Malware attacks targeting widely used non-executable formats, namely Microsoft Office and PDF files, have become a prevalent threat. These files, which encompass a broad spectrum of data types are classified as complex files. Existing malware detection models currently lack transparency, providing only binary labels without confidence scores. Incorporating confidence score enhances interpretability and detection accuracy. This article proposes a learning-based malware detection approach including two complementary parts. The first part involves the development of binary classifiers, on an enriched dataset of related files, with an extended feature set to achieve high accuracy. The second methodology employs regression models to ascribe a confidence score to each sample. A reliability score is assigned to various antiviruses to accurately label samples with confidence scores. By completion of the detection process, a pair consisting of x and y is provided, where x is the binary classifier output and y is the regressor output, showing the confidence score. Our findings demonstrate an enhancement compared to existing malware detection classifiers, with improvements of approximately 2.44% for PDF files and 2.27% for MS Office. Using confidence score along with binary classification boosts detection accuracy to 99.74% for PDFs and 99.77% for office files.

Version published to 10.21203/rs.3.rs-6418723/v1 on Research Square
May 15, 2025

System Call-Based Malware Detection Using Advanced Machine Learning Techniques

This article has 2 authors:
1. Nana Kwame Gyamfi
2. Nikolaj Goranin
This article has no evaluationsLatest version Jun 30, 2025
AI-Powered Defect Prediction: From Code Smells to Failure Forecasting

This article has 5 authors:
1. Md Mostafizur Rahman
2. Md Mostafijur Rahman
3. Maria Khatun Shuvra
4. Md Mashfiquer Rahman
5. Najmul Gony
This article has no evaluationsLatest version Jun 9, 2025
Machine Learning-Based Vulnerability Detection in Rust Code Using LLVM IR and Transformer Model

This article has 5 authors:
1. Young Lee
2. Syeda Jannatul Boshra
3. Jeong Yang
4. Zechun Cao
5. Gongbo Liang
This article has no evaluationsLatest version Jun 10, 2025

Listed in

Abstract

Article activity feed

Related articles

System Call-Based Malware Detection Using Advanced Machine Learning Techniques

AI-Powered Defect Prediction: From Code Smells to Failure Forecasting

Machine Learning-Based Vulnerability Detection in Rust Code Using LLVM IR and Transformer Model