PEBSI: Policy-Efficient Branching Variable Selection via Reinforcement Learning

Shuhan Du
Junbo Tong
Daming Shi
Yi Liu
Wenhui Fan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Mixed Integer Linear Programs (MILPs) are widely used to model real-world optimization problems, with the branch-and-bound (B\&B) algorithm serving as a fundamental solution method. The choice of branching strategy is crucial, and recent advances in machine learning have fueled interest in learning-to-branch techniques. Although imitation learning (IL) has shown promising results in replicating handcrafted branching heuristics, its performance is fundamentally limited by the quality of the expert strategy and the high cost of data labeling. Reinforcement learning (RL) presents a promising alternative, but the complexity of the branching process poses significant challenges for its direct application. Consequently, many existing RL-based methods still rely on expert demonstrations for pretraining or data augmentation, or impose additional constraints on the training procedure. This work introduces PEBSI, an efficient RL-based branching policy that eliminates the need for expert demonstrations or auxiliary configurations. The RL agent is trained using retrospective trajectories constructed from the original B\&B search tree, with a novel reward function and exploration strategy designed to improve policy efficiency. Experimental evaluations on diverse MILP benchmarks demonstrate that PEBSI outperforms other RL-based methods trained without expert guidance and, in some cases, even surpasses the IL-based approach.

Version published to 10.21203/rs.3.rs-7863374/v1 on Research Square
Nov 26, 2025

Reinforcement Learning for Real-World Non-Stationary Systems: An Observation-Aware Survey

This article has 1 author:
1. Yugam Padha
This article has no evaluationsLatest version Jan 28, 2026
A Brief Tutorial on Reinforcement Learning: From MDP to DDPG

This article has 1 author:
1. Tian Zhang
This article has no evaluationsLatest version Jan 6, 2026
A Brief Survey of Deep Reinforcement Learning Algorithms for Autonomous Systems

This article has 7 authors:
1. Maxwell Khan
2. Jackson Reynolds
3. Madison Taylor
4. Caleb Walker
5. Savannah Mitchell
6. Ethan Carter
7. Emma Davis
This article has no evaluationsLatest version Jan 22, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Reinforcement Learning for Real-World Non-Stationary Systems: An Observation-Aware Survey

A Brief Tutorial on Reinforcement Learning: From MDP to DDPG

A Brief Survey of Deep Reinforcement Learning Algorithms for Autonomous Systems