SecuDevSLM: A Systematic Security Evaluation Framework for Small Language Models on Mobile Devices

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The deployment of Small Language Models (SLMs) on edge devices introduces critical security challenges, as their constrained architectures and heterogeneous runtime environments exacerbate vulnerabilities like hallucination and jailbreaking—risks that remain understudied in resource-limited settings. To address this, we present SecuDevSLM, a fully on-device security evaluation framework for iOS and Android. Unlike traditional API-dependent approaches, SecuDevSLM eliminates dependence on external LLM APIs by leveraging a unified adversarial corpus for attack simulation, and introduces an integrated detection mechanism combining semantic noise entropy analysis and dual-modal evaluation, achieving 9% higher F1-score in threat identification compared to conventional tools. It systematically tests SLM robustness across 15 threat categories using 27 adversarial templates, generating multi-turn hallucination, jailbreak, and noise-based attacks. In total, 58 mainstream SLMs undergo tens of thousands of attack rounds. Compared to API-based methods, SecuDevSLM avoids high interaction costs and mitigates data leakage risks. Our experiments show that for sem antically aligned attacks, training corpus richness positively correlates with model robustness, though with diminishing returns. In weakly aligned attacks, the overall capacity of the model plays a larger role. Platform differences also impact stability: under hallucination, Android exhibits 23% greater character-level variance and 1.7 \((\times)\) higher fluctuation in response to text length than iOS. However, in long-text hallucinations, both perform similarly. For jailbreak attacks, Android shows 56% more output volatility, and when corpus richness is low, its attack success rate reaches 85%, versus 10% on iOS. Finally, we confirm that static indicators, such as parameter count and dataset features, are strongly linearly correlated with real-world performance, enabling developers better measure the security and performance of models before they are connected to the system.

Article activity feed