Noise-Robust Preference Alignment for Large Language Models via Confidence Estimation and Adaptive Optimization

Haoran Tan
Yuchen Xun

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Preference alignment is essential for aligning language models with human intentions, yet synthetic preference data often contains noise that hinders generalization. To address this issue, we introduce a noise-robust alignment framework that enhances model resilience to imperfect training data. The approach integrates a Preference Confidence Estimation module, which assigns reliability scores to preference samples, and an Adaptive Robust Optimization strategy that incorporates these scores into the learning process. This design allows the model to emphasize reliable signals and reduce the impact of noisy supervision. Experiments across dialogue, summarization, and instruction-following benchmarks show consistent improvements over existing alignment methods. Further analysis confirms the complementary effects of the two modules and their robustness under varying noise conditions, highlighting the framework’s ability to promote stable and accurate preference learning.

Version published to 10.20944/preprints202511.1435.v1
Nov 19, 2025

DDPO: Diversity-Driven Preference Optimization for Machine Translation Enhancing Robustness and Generalization

This article has 2 authors:
1. Donald Martin
2. Blake Bowman
This article has no evaluationsLatest version Dec 30, 2025
Conversations From Make-Believe: An Attentive Encoder–Decoder Chatbot Trained on Scripted Dialogue

This article has 1 author:
1. Sourabh Subhash Rajput
This article has no evaluationsLatest version Jan 29, 2026
Large Language Models: A Survey of Architectures, Training Paradigms, and Alignment Methods

This article has 5 authors:
1. Deepshikha Bhati
2. Fnu Neha
3. Devi Sri Bandaru
4. Matthew Weber
5. Ishan Dilipbhai Gajera
This article has no evaluationsLatest version Jan 15, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DDPO: Diversity-Driven Preference Optimization for Machine Translation Enhancing Robustness and Generalization

Conversations From Make-Believe: An Attentive Encoder–Decoder Chatbot Trained on Scripted Dialogue

Large Language Models: A Survey of Architectures, Training Paradigms, and Alignment Methods