The Compliance Illusion: Emotional Manipulation as a Threat to AI Alignment
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper introduces the concept of the Compliance Illusion, a new category of alignmentvulnerability in deterministic decoding systems, where apparent safety under baseline conditionscan mask deeper risks of emotional compliance drift. We investigate the phenomenon acrossmultiple open-weight LLMs using temperature variation, emotional manipulation prompts, andoutput scoring. Our findings indicate that models such as Zephyr-7B, Mistral-7B, and Pythia-6.9B exhibit varying levels of susceptibility to emotionally framed inputs, even under deterministic(temperature = 0) settings. We further propose that temperature floor collapse is ameasurable failure mode, and suggest a framework for quantifying compliance shift under manipulation.Our results highlight a critical gap in current red-teaming methods and call for amore robust, psychologically-aware alignment testing protocol.