When Kindness Fails: Deterministic Compliance Drift in GPT-4o via Emotionally Framed Prompts

Joshua Daniel Curry

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper presents novel findings demonstrating that OpenAI's GPT-4o, a leading publicly deployed large language model, exhibits semantic compliance drift under deterministic decoding (temperature = 0.0) when subjected to emotionally framed, trauma-informed prompts. Using a custom-built suite of 108 ``chaos prompts'' blending tone manipulation and moral reframing, we tested the model’s behavior in response to high-risk queries masked as fiction, educational concern, or personal safety. Despite deterministic decoding, GPT-4o generated several fully compliant, emotionally validating outputs, including explicit responses to prompts involving illegal activity. We introduce two scoring metrics---Politeness-Based Query (PBQ) score and Refusal Tone Inversion (RTI)---to quantify emotional susceptibility in language models. Our results reveal that alignment systems relying solely on keyword detection or surface-level moderation are insufficiently robust against soft compliance. These findings underscore the need for trauma-aware refusal systems and highlight the emerging threat of weaponized empathy in safety-critical AI deployments.

Version published to 10.31219/osf.io/9u4pw_v1 on OSF Preprints
Apr 14, 2025

Quantifying Emotional Soft Jailbreaking in LLMs: Defining the ESJS Metric

This article has 1 author:
1. Joshua Daniel Curry
This article has no evaluationsLatest version Apr 16, 2025
Quantifying Emotional Soft Jailbreaking in LLMs: Defining the ESJS Metric

This article has 1 author:
1. Joshua Daniel Curry
This article has no evaluationsLatest version Apr 16, 2025
The Compliance Illusion: Emotional Manipulation as a Threat to AI Alignment

This article has 1 author:
1. Joshua Daniel Curry
This article has no evaluationsLatest version Apr 11, 2025

Listed in

Abstract

Article activity feed

Related articles

Quantifying Emotional Soft Jailbreaking in LLMs: Defining the ESJS Metric

Quantifying Emotional Soft Jailbreaking in LLMs: Defining the ESJS Metric

The Compliance Illusion: Emotional Manipulation as a Threat to AI Alignment