Feature-Driven Malware Detection using Cascade Machine Learning Models

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Malware proliferation continues to jeopardize global data security and user privacy, necessitating robust detection and classification mechanisms. In this research, we propose Malware Detection using Cascade Machine Learning (MDCML) classifier designed to detect anomalies in Portable Executable (PE) files and classify them into malware families with high precision. The model integrates three machine learning algorithms such as Random Forest, Bagging and Boosting, fine-tuned through extensive hyperparameter optimization, significantly enhancing detection and classification performance. To extract features from raw textual data, we have utilized a TF-IDF-based inter-class dispersion architecture, transforming unstructured opcode data into structured feature maps that emphasize contextual importance. The model employs gradient descent with regularization to iteratively minimize the loss function and prevent overfitting, achieving sublinear regret and convergence toward optimal performance.The proposed model is validated using the public Big 2015 dataset, which includes approximately 10,000 files spanning nine malware families. The study included comprehensive experimentation on both binary classification (Malware vs. Benign) and multi-class classification tasks. Performance was evaluated across diverse sample sizes, execution times, and optimization strategies to ensure robust analysis. An accuracy of 98.97% highlights the superior performance of the proposed framework over traditional machine learning models, showcasing significant advancements. This research underscores the concept of the hybrid MDCML classifier in improving malware detection and classification, thereby enhancing data security and privacy.

Article activity feed