Adversarial Hallucination Engineering: Targeted Misdirection Attacks Against LLM Powered Security Operations Centers
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large Language Models (LLMs) are increasingly deployed in Security Operations Centers (SOCs) for alert triage and threat‑intelligence synthesis. We study Adversarial Hallucination Engineering (AHE): attacks that bias LLM reasoning by introducing small clusters of poisoned context into retrieval‑augmented generation (RAG) pipelines, producing targeted fabrications aligned with attacker goals. Using a safe, fully synthetic simulator of a RAG+LLM SOC, we formalize the AHE threat model, introduce Hallucination Propagation Chains (HPCs)—mutually reinforcing poisoned documents designed to create artificial consensus at retrieval time—and evaluate a lightweight defense, Chain‑of‑Thought Attestation (CoTA), based on per‑token uncertainty, provenance attribution, and source reputation. Across three model scales, hallucination‑induction rate (HIR) rises superlinearly with HPC size (e.g., for a Large‑70B proxy, 12.45%→61.84% as HPC size in top‑k grows 0→5); actionable misconfiguration rate (AMR) grows from 3.23% (no attack) to 38.18% (HPC‑5). CoTA reduces attack‑success rate (ASR) by ~55% for HPCs≥3 at ~7% false‑positive flags and ~8% latency overhead. We release synthetic artifacts to support reproducible, defensive research.