Machine Learning-Based Vulnerability Detection in Rust Code Using LLVM IR and Transformer Model

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Rust’s growing popularity in high-integrity systems requires automated vulnerability detection in order to maintain its strong safety guarantees. Although Rust’s ownership model and compile-time checks prevent many errors, sometimes unexpected bugs may occasionally pass analysis, underlining the necessity for automated safe and unsafe code detection. This paper presents Rust-IR-BERT, a machine learning approach to detect security vulnerabilities in Rust code by analyzing its compiled LLVM intermediate representation (IR) instead of the raw source code. Using LLVM IR provides a language-neutral, semantically rich view of the program, capturing data and control flow, and reducing the noise of high-level syntax differences. Our method leverages a transformer model, GraphCodeBERT, to embed the IR and CatBoost classifier to classify code as vulnerable or safe. When evaluated on a mix of known buggy and safe code, this method obtained 98.11% overall accuracy, with a recall of 99.31% for safe code and 93.67% for vulnerable code. Our evaluation utilizes a diverse dataset of over 2,300 CVE-linked and Rust snippets compiled to LLVM IR, facilitating wide-range of coverage across real-world crates.

Article activity feed