Reducing MISRA violations in LLM-generated code by 83%: An empirical study with static analysis verification

Mariusz Woloszyn
Leszek Jerzy Raszka

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large Language Models (LLMs) are increasingly used for C++ code generation, yet their ability to satisfy Motor Industry Software Reliability Association (MISRA) C++:2023 guidelines at scale remains unclear. This study conducts a controlled before–after, repeated-measures study on 26 C++ tasks, evaluating four models with 20 runs per condition. Compiled outputs are checked with a complete MISRA C++:2023 ruleset. Verbose rule texts are distilled into compact, actionable Top-k instruction packs (k=3,5,10) targeting each model’s most frequent violations. Primary outcomes are violations per thousand lines of code (KLOC), compile rate, and pass rate. At baseline, models cluster at 23–29 violations/KLOC, dominated by an advisory rule discouraging standard integer type names. Adding Top-k instructions reduces violations by 44–83% across models (paired permutation tests, all p < 0.01); GPT-5 and o3 reach 3.9–4.5 violations/KLOC. Functional impacts are small overall; two conditions show significant pass-rate declines (GPT-4.1/Top-3, o3/Top-10). Improvements spill over to non-targeted rules. Compact, model-aware MISRA prompts therefore offer a practical path to safer C++ generation with limited functional cost when scoped appropriately. However, full verification still requires dedicated compliance tooling to detect residual issues, quantify results for certification, and produce auditable evidence for regulators. Practitioners should adopt a step-up strategy, start with Top-3 or Top-5 rules, monitor compile and pass rates, and expand only when stable. Study artifacts are released to enable replication and reuse.

Version published to 10.21203/rs.3.rs-8123173/v1 on Research Square
Nov 25, 2025

Systematic Prompt Optimization for LLM-Based Backend API Generation: An Empirical Study in NestJS

This article has 1 author:
1. Himanshu Sharma
This article has no evaluationsLatest version Jan 28, 2026
Multi-Sallm: A Multilingual Security Assessment of Generated Code

This article has 5 authors:
1. Mohammed Latif Siddiq
2. Noshin Ulfat
3. Nishat Raihan
4. Joanna C. S. Santos
5. Marcos Zampieri
This article has no evaluationsLatest version Dec 16, 2025
AI-Driven Code Documentation: Comparative Evaluation of LLMs for Commit Message Generation

This article has 4 authors:
1. Mohamed Mehdi Trigui
2. Wasfi G. Al-Khatib
3. Mohammad Amro
4. Fatma Mallouli
This article has no evaluationsLatest version Dec 24, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Systematic Prompt Optimization for LLM-Based Backend API Generation: An Empirical Study in NestJS

Multi-Sallm: A Multilingual Security Assessment of Generated Code

AI-Driven Code Documentation: Comparative Evaluation of LLMs for Commit Message Generation