Robust Quadrupedal Locomotion on Complex Terrains via Adaptive Entropy Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Quadruped robots are increasingly required to operate in complex, unstructured terrains for tasks such as inspection and exploration. In harsh environments, exteroceptive sensors often provide only partial or noisy observations, making proprioceptive terrain estimation and motion stability even more critical. In addition, simulation-based policy training must efficiently balance exploration and exploitation. In this paper, we make two main contributions. (1) Building on the terrain-imagination framework (CENet) of DreamWaQ, we introduce stability-oriented rewards based on the Variable Height Inverted Pendulum (VHIP) model and a stand-still pose reward, improving both static and dynamic stability on complex terrains. (2) We extend PPO with a multi-metric dynamic entropy coefficient that adapts to performance gaps (velocity tracking and terrain utilization), yielding faster convergence and improved final performance in simulation. Ablation studies in MuJoCo show that VHIP rewards significantly reduce fall rates on challenging terrain levels; comparison with DreamWaQ in Isaac Gym shows statistically significant gains in linear velocity tracking and convergence speed. We further deploy the policy on a DeepRobotics Lite3 robot; real-world tests on stairs, rough and smooth surfaces, and grassland provide qualitative evidence of deployment feasibility.