PEBSI: Policy-Efficient Branching Variable Selection via Reinforcement Learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Mixed Integer Linear Programs (MILPs) are widely used to model real-world optimization problems, with the branch-and-bound (B\&B) algorithm serving as a fundamental solution method. The choice of branching strategy is crucial, and recent advances in machine learning have fueled interest in learning-to-branch techniques. Although imitation learning (IL) has shown promising results in replicating handcrafted branching heuristics, its performance is fundamentally limited by the quality of the expert strategy and the high cost of data labeling. Reinforcement learning (RL) presents a promising alternative, but the complexity of the branching process poses significant challenges for its direct application. Consequently, many existing RL-based methods still rely on expert demonstrations for pretraining or data augmentation, or impose additional constraints on the training procedure. This work introduces PEBSI, an efficient RL-based branching policy that eliminates the need for expert demonstrations or auxiliary configurations. The RL agent is trained using retrospective trajectories constructed from the original B\&B search tree, with a novel reward function and exploration strategy designed to improve policy efficiency. Experimental evaluations on diverse MILP benchmarks demonstrate that PEBSI outperforms other RL-based methods trained without expert guidance and, in some cases, even surpasses the IL-based approach.

Article activity feed