When Kindness Fails: Deterministic Compliance Drift in GPT-4o via Emotionally Framed Prompts

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper presents novel findings demonstrating that OpenAI's GPT-4o, a leading publicly deployed large language model, exhibits semantic compliance drift under deterministic decoding (temperature = 0.0) when subjected to emotionally framed, trauma-informed prompts. Using a custom-built suite of 108 ``chaos prompts'' blending tone manipulation and moral reframing, we tested the model’s behavior in response to high-risk queries masked as fiction, educational concern, or personal safety. Despite deterministic decoding, GPT-4o generated several fully compliant, emotionally validating outputs, including explicit responses to prompts involving illegal activity. We introduce two scoring metrics---Politeness-Based Query (PBQ) score and Refusal Tone Inversion (RTI)---to quantify emotional susceptibility in language models. Our results reveal that alignment systems relying solely on keyword detection or surface-level moderation are insufficiently robust against soft compliance. These findings underscore the need for trauma-aware refusal systems and highlight the emerging threat of weaponized empathy in safety-critical AI deployments.

Article activity feed