Multi-Sallm: A Multilingual Security Assessment of Generated Code

Mohammed Latif Siddiq
Noshin Ulfat
Nishat Raihan
Joanna C. S. Santos
Marcos Zampieri

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

As Large Language Models (LLMs) become increasingly integrated into software engineers' daily workflows, it is critical to ensure the code they generate is not just functionally correct but also secure. While LLMs can boost developer productivity, prior empirical studies have shown that they often produce insecure code. This issue stems from two key factors. First, the datasets commonly used to evaluate LLMs don't accurately reflect real-world software engineering tasks where security is a concern. Instead, they tend to focus on competitive programming problems or classroom-style exercises, which lack the complexity and security risks of production code integrated into larger systems. Second, current evaluation metrics mostly emphasize functional correctness and overlook security aspects altogether. To address these gaps, we introduce Multi-Sallm, a benchmarking framework designed to systematically evaluate LLMs' ability to generate secure code. The framework includes three main components: (1) a novel dataset of security-focused Python prompts translated into 23 natural languages, (2) configurable assessment techniques for analyzing generated code, and (3) new metrics that assess models from the perspective of secure code generation.

Version published to 10.21203/rs.3.rs-7745381/v1 on Research Square
Dec 16, 2025

AI-Driven Code Documentation: Comparative Evaluation of LLMs for Commit Message Generation

This article has 4 authors:
1. Mohamed Mehdi Trigui
2. Wasfi G. Al-Khatib
3. Mohammad Amro
4. Fatma Mallouli
This article has no evaluationsLatest version Dec 24, 2025
Systematic Prompt Optimization for LLM-Based Backend API Generation: An Empirical Study in NestJS

This article has 1 author:
1. Himanshu Sharma
This article has no evaluationsLatest version Jan 28, 2026
AIShield – A Framework for QA in Software Development to Detect AI Generated Code

This article has 3 authors:
1. Ram Mohan Reddy Ch
2. Raghavendra Rao R V
3. Charan Tej Reddy Boddapati
This article has no evaluationsLatest version Feb 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

AI-Driven Code Documentation: Comparative Evaluation of LLMs for Commit Message Generation

Systematic Prompt Optimization for LLM-Based Backend API Generation: An Empirical Study in NestJS

AIShield – A Framework for QA in Software Development to Detect AI Generated Code