CQLLM: A Framework for Generating CodeQL Security Vulnerability Detection Code Based on Large Language Model

Le Wang
Chan Chen
Junyi Zhu
Rufeng Zhan
Weihong Han

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

With the increasing complexity of software systems, the number of security vulnerabilities contained within software has risen accordingly. The existing shift-left security concept aims to detect and fix vulnerabilities during the software development cycle. While CodeQL stands as the premier static code analysis tool currently available on the market, its high barrier to entry poses challenges for meeting the implementation requirements of shift-left security initiatives. While large language model (LLM) offers potential assistance in QL code development, the inherent complexity of code generation tasks often leads to persistent issues such as syntactic inaccuracies and references to non-existent modules, which consequently constrains their practical applicability in this domain. To address these challenges, this paper proposes CQLLM, a novel framework for automating the generation of CodeQL security vulnerability detection code by leveraging LLM. This framework is designed to enhance both the efficiency and the accuracy of automated QL code generation, thereby advancing static code analysis for a more efficient and intelligent paradigm for vulnerability detection.First, retrieval-augmented generation (RAG) is employed to search the vector database for dependency libraries and code snippets that are highly similar to the user’s input, thereby constraining the model’s generation process and preventing the import of invalid modules. Then, the user input and the knowledge chunks retrieved by RAG are fed into a fine-tuned LLM to perform reasoning and generate QL code. By integrating external knowledge bases with the large model, the framework enhances the correctness and completeness of the generated code.Experimental results show that CQLLM significantly improves the executability of the generated QL code, with the execution success rate improving from 0.31% to 72.48%, outperforming the original model by a large margin. Meanwhile, CQLLM also enhances the effectiveness of the generated results, achieving a CWE coverage rate of 57.4% in vulnerability detection tasks, demonstrating its practical applicability in real-world vulnerability detection.

Version published to 10.20944/preprints202510.1458.v1
Oct 20, 2025

Automated IoT Firmware Vulnerability Detection Using Large Language Models

This article has 4 authors:
1. Sushant Mane
2. Jai Bhortake
3. Vidhi Wankhade
4. Faruk Kazi
This article has no evaluationsLatest version Oct 2, 2025
A Pattern-Oriented Ontology and Workflow Modeling Approach for the Sui Move Programming Language

This article has 2 authors:
1. Antonios Giatzis
2. Christos K. Georgiadis
This article has no evaluationsLatest version Oct 28, 2025
Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback

This article has 1 author:
1. Shiyin Lin
This article has no evaluationsLatest version Sep 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Automated IoT Firmware Vulnerability Detection Using Large Language Models

A Pattern-Oriented Ontology and Workflow Modeling Approach for the Sui Move Programming Language

Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback