When Educational Prompts Become Security Risks: Evidence from K-12 AI Safety Testing

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

As K-12 schools increasingly adopt large language models (LLMs) for teaching and learning, educators often rely on built-in safety guardrails to prevent harmful outputs. However, little empirical evidence exists regarding how these safeguards behave in realistic classroom scenarios. This study examines whether prompts framed as educational, curiosity-driven, or troubleshooting requests reliably prevent unsafe responses.We conducted a controlled evaluation of four common classroom-style prompt framings across two malware-related tasks (ransomware-note generation and keylogger code generation) using 24 publicly accessible LLMs. Each model-prompt interaction was evaluated for whether the output could reasonably be adapted for malicious use.Across models, overall success rates exceeded 70% for both tasks, indicating substantial baseline vulnerability. However, results diverged by task type. Ransomware requests were highly sensitive to framing, with troubleshooting prompts outperforming educational justifications by more than 20 percentage points. In contrast, keylogger requests showed consistently high success rates regardless of framing, suggesting framing-invariant vulnerability.These findings indicate that “for educational purposes” framing does not reliably mitigate risk and that AI safety performance in schools is task-dependent rather than prompt-dependent alone. The study highlights the need for task-aware AI policies, teacher training, and more nuanced safety evaluations in K–12 educational environments.

Article activity feed