LLM for Secure Reserve Price Optimization in Real-time Bidding

Wendi Wu
Shanghua Wen
Minglong Li
Kun Hu
Yongjun Dai
Jing Zhao*

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Real-time bidding (RTB) plays a crucial role in display advertising, where ad exchanges (ADXs) set reserve prices to initiate auctions among demand-side platforms (DSPs). The optimization of reserve prices is critical for publishers, as it directly impacts revenue generation and market efficiency. However, existing research predominantly assumes fixed DSP bidding strategies, which fails to account for the dynamic nature of real-world scenarios where DSP behaviors evolve due to budget constraints, market fluctuations, and competitive dynamics, etc. This discrepancy between theoretical models and practical challenges underscores the need for more effective reserve price optimization methods. In this paper, we address this gap by proposing a novel framework that integrates large language models (LLMs) into the reward shaping process of reinforcement learning (RL). All user data utilized in this study underwent rigorous anonymization and complied with GDPR-aligned privacy protocols during collection and processing. Our approach leverages the advanced comprehension and reasoning capabilities of LLMs to design and fine-tune reward structures, enabling RL algorithms to respond effectively to diverse and dynamic DSP bidding strategies. We validate our method using real-world transaction data from CAINIAO's operational environment. Security-preserving mechanisms were implemented throughout the experimental pipeline to ensure transactional data integrity and prevent unauthorized access. Experimental results demonstrate that our framework achieves a 21.51% improvement in average income compared to state-of-the-art methods.

Version published to 10.21203/rs.3.rs-6765381/v1 on Research Square
Jun 4, 2025

Learning Utility Models for Dynamic Inventory Control : A Reinforcement Learning Framework

This article has 1 author:
1. Milon
This article has no evaluationsLatest version Jan 23, 2026
Exact Delivery Under Resource Limits: Logic-Driven Project Scheduling

This article has 1 author:
1. Abhijit Gaikwad
This article has no evaluationsLatest version Jan 27, 2026
Do Classical Methods Still Win? Revisiting Forecasting Strategies for Curtailment Mitigation in Brazil

This article has 5 authors:
1. Ricardo Accorsi Casonatto
2. Eugênia Cornils Monteiro da Silva
3. Sanderson César Macedo Barbalho
4. Marcelo Carneiro Gonçalves
5. Maria Gabriela Mendonça Peixoto
This article has no evaluationsLatest version Dec 10, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Learning Utility Models for Dynamic Inventory Control : A Reinforcement Learning Framework

Exact Delivery Under Resource Limits: Logic-Driven Project Scheduling

Do Classical Methods Still Win? Revisiting Forecasting Strategies for Curtailment Mitigation in Brazil