Enhancing search-based testing with LLMs for finding bugs in system simulators

Aidan Dakhama
Karine Even-Mendoza
W. B Langdon
Héctor D. Menéndez
Justyna Petke

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Despite the wide availability of automated testing techniques such as fuzzing, little attention has been devoted to testing computer architecture simulators. We propose a fully automated approach for this task. Our approach uses large language models (LLM) to generate input programs, including information about their parameters and types, as test cases for the simulators. The LLM’s output becomes the initial seed for an existing fuzzer, , which has been enhanced with three mutation operators, targeting both the input binary program and its parameters. We implement our approach in a tool called . We use it to test the system simulator. discovered 21 new bugs in , 14 where ’s software prediction differs from the real behaviour on actual hardware, and 7 where it crashed. New defects were uncovered with each of the 6 LLMs used.

Version published to 10.1007/s10515-025-00531-7
Jul 10, 2025
Version published to 10.21203/rs.3.rs-5004178/v1 on Research Square
Sep 18, 2024

Systematic Prompt Optimization for LLM-Based Backend API Generation: An Empirical Study in NestJS

This article has 1 author:
1. Himanshu Sharma
This article has no evaluationsLatest version Jan 28, 2026
Evaluating the Effectiveness of Automated vs. Manual Testing: A Case Study on Sauce Demo platform

This article has 3 authors:
1. basma elsaid azzam
2. atef raslan
3. Essam Al Ameen
This article has no evaluationsLatest version Dec 19, 2025
Characteristics of MSFVenom software for Linux/ARM architecture and its application for complete exploit design

This article has 3 authors:
1. Krystian Rykaczewski
2. Partyk Grelewicz
3. Krzysztof Stebel
This article has no evaluationsLatest version Dec 16, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Systematic Prompt Optimization for LLM-Based Backend API Generation: An Empirical Study in NestJS

Evaluating the Effectiveness of Automated vs. Manual Testing: A Case Study on Sauce Demo platform

Characteristics of MSFVenom software for Linux/ARM architecture and its application for complete exploit design