Policy-Guided Model Predictive Path Integral for Safe Manipulator Trajectory Planning with Constrained Discounted Reinforcement Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Aiming at the problems of difficult hard constraint enforcement, weak environmental generalization ability in the safe trajectory planning of manipulators in complex environments, a Policy-Guided Model Predictive Path Integral (PG-MPPI) planning framework is proposed. This framework integrates the advantages of reinforcement learning and model predictive control to construct a global prior guidance, local real-time optimization and hard constraint safety assurance: a Constraint-Discounted Soft Actor-Critic (CD-SAC) offline learning policy is designed, which incorporates the configuration-space distance field as a safety guidance term to realize the learning of obstacle avoidance behavior; the offline policy is used to guide the online sampling and optimization of MPPI, improving sampling efficiency and planning quality; a Control Barrier Function (CBF) safety filter is introduced to revise control commands in real time, ensuring the strict satisfaction of constraints. Taking the SIASUN T12B manipulator as the research object, simulation comparison experiments are carried out in multi-obstacle scenarios. The results show that the PG-MPPI algorithm outperforms the comparison algorithms in the success rate of collision-free target reaching, ensure the smoothness and feasibility of the trajectory, and has a good adaptive capacity to dynamic environments, thus providing an efficient solution for the autonomous and safe operation of manipulators.