Safe Reinforcement Learning for Vision-Based Robotic Manipulation in Human-Centered Environments
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Autonomous systems performing object manipulation in human-robot collaboration scenarios face fundamental challenges in balancing adaptability with safety constraints. We present a RL framework that addresses these challenges through safety-aware policy learning. Building upon OpenAI Safety Gym, we extend its capabilities by implementing a robotic arm model for object manipulation tasks. Our approach employs end-to-end policy learning, comparing a constrained Lagrangian variant of Proximal Policy Optimization (cPPO) against standard PPO and Soft Actor-Critic (SAC) baselines. To handle high-dimensional 1 visual inputs, we develop a structured representation learning method that effectively captures multiple skills, objects, and their interactions. The framework enables goal-conditioned manipulation across object configurations, demonstrating strong compositional generalization: the agent trained on simple two-cubic object scenarios successfully generalized to tasks with three distinct objects in more cluttered settings. Due to computational constraints of high-mass objects in the simulation environment, testing was limited to scenarios with up to three objects. Experimental results show that cPPO achieves superior safety performance with an average episode cost of 15.26 compared to 18.03 for PPO and 19.48 for SAC. While cPPO’s task performance (average episode reward ∼30) is slightly lower than PPO’s (∼35), it significantly outperforms SAC (∼12). The algorithms demonstrate convergence by 200,000 environment steps, with cPPO achieving rapid safety compliance while exhibiting steady convergence in learning performance. These findings demonstrate the effectiveness of integrating safety constraints within RL for autonomous manipulation, advancing the practical deployment of collaborative robotic systems.