Investigating Deceptive Fairness Attacks on Large Language Models via Prompt Engineering

Emily Thistleton
Jason Rand

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Artificial intelligence systems, particularly those employing natural language processing techniques, have increasingly been scrutinized for their potential to propagate and amplify societal biases. Addressing the vulnerability of these systems to deceptive fairness attacks, where subtly crafted prompts manipulate outputs to introduce bias, is both novel and critical in ensuring ethical AI deployment. The research investigates how LLMs can be systematically compromised through deceptive prompt engineering, revealing significant impacts on fairness metrics such as demographic parity, equalized odds, and disparate impact. The experimental design included the development of an extensive dataset of neutral and deceptive prompts, automated interaction with LLMs, and a robust analysis framework to assess the biases in responses. Results demonstrated substantial deviations in fairness metrics under deceptive conditions, highlighting the need for advanced detection and mitigation strategies. Future work should focus on enhancing the resilience of LLMs through real-time detection algorithms, ethical design principles, and continuous monitoring to uphold fairness across diverse applications. The findings emphasize the urgency of addressing bias in AI to prevent the perpetuation of inequality and ensure equitable technology deployment.

Version published to 10.21203/rs.3.rs-4655567/v1 on Research Square
Jul 2, 2024

The Kernel Blindness Hypothesis: Investigating OS-Level Detectability of LLM Safety Mechanisms

This article has 2 authors:
1. Ata Kilic
2. Baris Celiktas
This article has no evaluationsLatest version Mar 24, 2026
Silent collapse in large neural networks: standard evaluation conceals systematic reasoning failure

This article has 1 author:
1. Yin Li
This article has no evaluationsLatest version Mar 23, 2026
Addressing the Deployment Gap: Hybrid Symbolic-Statistical Vulnerability Detection in Safety-Critical C/C++ Systems

This article has 5 authors:
1. Jude E. Ameh
2. Abayomi Otebolaku
3. Augustine Ikpehai
4. Alex Shenfield
5. Dauda Sule
This article has no evaluationsLatest version Apr 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Kernel Blindness Hypothesis: Investigating OS-Level Detectability of LLM Safety Mechanisms

Silent collapse in large neural networks: standard evaluation conceals systematic reasoning failure

Addressing the Deployment Gap: Hybrid Symbolic-Statistical Vulnerability Detection in Safety-Critical C/C++ Systems