Comparative Study of Natural Language Processing Models for Malware Detection Using API Call Sequences
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In the evolving landscape of cybersecurity, the manual and time-consuming process of identifying malware remains a major bottleneck in security analysis. This study presents a novel approach to addressing this challenge by leveraging Natural Language Processing (NLP) techniques. This research focuses on a comparative analysis of two neural networks—a Long Short-Term Memory (LSTM) model and a Transformer model that analyze API call sequences and capture the relationships between API calls. Using a publicly available dataset, the models perform binary malware detection (malicious vs. benign). The experimental findings demonstrate that the NLP-based paradigm is highly effective. The Transformer model consistently and significantly outperformed the LSTM model, achieving 95.54\% accuracy in distinguishing malware from benign samples. The success of the Transformer highlights the advantage of the attention mechanism in capturing long-range dependencies and deciphering complex malicious patterns from behavioral sequences. By representing system-level API calls as a linguistic structure, this approach establishes an efficient and dynamic framework for malware detection, aiding in cybersecurity threat response.