The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility?

Yiyi Zhang
Xingyu Chen
Kexin Chen
Yuyang Du
Xilin Dang
Pheng-Ann Heng

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Recent years have witnessed extensive efforts to enhance Large Language Models (LLMs) across various domains, alongside growing attention to their ethical implications. However, a critical challenge remains largely overlooked: LLMs must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility. This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance by addressing this ethical-utility trade-off, using chemical domain applications as a proof-of-concept. Our alignment pipeline starts with a GPT-assisted three-phase data generation scheme, in which we create LibraChemQA, a chemical question-answering dataset comprising 31.6k triplet instances. By incorporating an innovative balanced seed in the data generation process, our framework systematically considers both legitimate and illegitimate requests. The framework also introduces a rephrasing mechanism for efficient data augmentation that enhances the model’s chemical comprehension. We further develop a novel hybrid evaluation scheme with LLM judges for precise assessment of both safety and utility. Experimental results demonstrate our model’s substantial improvements in overall performance where both safety and utility are considered - our resulting model, LibraChem, outperforms leading LLMs including Claude-3, GPT-4o, and LLaMA-3 by margins of 13.44%, 7.16%, and 7.10% respectively on our released benchmark.

Version published to 10.32388/cw5qru
Feb 7, 2025

Benchmarking LLM Fairness: Multi-Agent Evaluators for Scalable Model Assessment

This article has 1 author:
1. Anil Kumar Jonnalagadda
This article has no evaluationsLatest version Dec 11, 2025
A Red Teaming Framework for Large Language Models: A Case Study on Faithfulness Evaluation

This article has 3 authors:
1. Abrar Alotaibi
2. Raed Mughus
3. Moataz Ahmed
This article has no evaluationsLatest version Dec 18, 2025
TriEthix: a Triadic Benchmark for Ethical Alignment in Foundation Models

This article has 1 author:
1. Albert Barqué-Duran
This article has no evaluationsLatest version Dec 15, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Benchmarking LLM Fairness: Multi-Agent Evaluators for Scalable Model Assessment

A Red Teaming Framework for Large Language Models: A Case Study on Faithfulness Evaluation

TriEthix: a Triadic Benchmark for Ethical Alignment in Foundation Models