Policy-Guided Model Predictive Path Integral for Safe Manipulator Trajectory Planning with Constrained Discounted Reinforcement Learning

Liang Liang
Chengdong Wu
Xiaofeng Wang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Aiming at the problems of difficult hard constraint enforcement, weak environmental generalization ability in the safe trajectory planning of manipulators in complex environments, a Policy-Guided Model Predictive Path Integral (PG-MPPI) planning framework is proposed. This framework integrates the advantages of reinforcement learning and model predictive control to construct a global prior guidance, local real-time optimization and hard constraint safety assurance: a Constraint-Discounted Soft Actor-Critic (CD-SAC) offline learning policy is designed, which incorporates the configuration-space distance field as a safety guidance term to realize the learning of obstacle avoidance behavior; the offline policy is used to guide the online sampling and optimization of MPPI, improving sampling efficiency and planning quality; a Control Barrier Function (CBF) safety filter is introduced to revise control commands in real time, ensuring the strict satisfaction of constraints. Taking the SIASUN T12B manipulator as the research object, simulation comparison experiments are carried out in multi-obstacle scenarios. The results show that the PG-MPPI algorithm outperforms the comparison algorithms in the success rate of collision-free target reaching, ensure the smoothness and feasibility of the trajectory, and has a good adaptive capacity to dynamic environments, thus providing an efficient solution for the autonomous and safe operation of manipulators.

Version published to 10.20944/preprints202603.0086.v1
Mar 3, 2026

Reinforcement Learning-based Decision-Making for Safe Motion Planning in Complex Driving Scenarios

This article has 4 authors:
1. Raafat E. Shalaby
2. Amr Abo Salem
3. Mohamed I. Mahmoud
4. Tarek A. Mahmoud
This article has no evaluationsLatest version Mar 5, 2026
On the Design of Reward Function for Reinforcement Learning–Based Path Following Control Using the TD3 Algorithm

This article has 2 authors:
1. Mahdi Khodaparast
2. Jafar Roshanian
This article has no evaluationsLatest version Feb 13, 2026
Characterization of a Fixed Reinforcement Learning Policy for Aerial Robot with Suspended Payload under Variable Flight Conditions

This article has 3 authors:
1. Ali Tahir Karasahin
2. Ziniu Wu
3. Basaran Bahadir Kocer
This article has no evaluationsLatest version Mar 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Reinforcement Learning-based Decision-Making for Safe Motion Planning in Complex Driving Scenarios

On the Design of Reward Function for Reinforcement Learning–Based Path Following Control Using the TD3 Algorithm

Characterization of a Fixed Reinforcement Learning Policy for Aerial Robot with Suspended Payload under Variable Flight Conditions