Robust Quadrupedal Locomotion on Complex Terrains via Adaptive Entropy Learning

Jiale Chen
Lingyun Kong
ZhenYao Zhang
Zhipeng Xue

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Quadruped robots are increasingly required to operate in complex, unstructured terrains for tasks such as inspection and exploration. In harsh environments, exteroceptive sensors often provide only partial or noisy observations, making proprioceptive terrain estimation and motion stability even more critical. In addition, simulation-based policy training must efficiently balance exploration and exploitation. In this paper, we make two main contributions. (1) Building on the terrain-imagination framework (CENet) of DreamWaQ, we introduce stability-oriented rewards based on the Variable Height Inverted Pendulum (VHIP) model and a stand-still pose reward, improving both static and dynamic stability on complex terrains. (2) We extend PPO with a multi-metric dynamic entropy coefficient that adapts to performance gaps (velocity tracking and terrain utilization), yielding faster convergence and improved final performance in simulation. Ablation studies in MuJoCo show that VHIP rewards significantly reduce fall rates on challenging terrain levels; comparison with DreamWaQ in Isaac Gym shows statistically significant gains in linear velocity tracking and convergence speed. We further deploy the policy on a DeepRobotics Lite3 robot; real-world tests on stairs, rough and smooth surfaces, and grassland provide qualitative evidence of deployment feasibility.

Version published to 10.21203/rs.3.rs-9269338/v1 on Research Square
Apr 14, 2026

Trust Guided Reinforcement Learning for Safe Robot Navigation with Dynamic Window Approach

This article has 4 authors:
1. Yuhan Wang
2. Haonan Li
3. Hu Luo
4. Gebel Elena Sergeevna
This article has no evaluationsLatest version Apr 17, 2026
Energy-Aware Autonomous UAV Navigation via Deep Reinforcement Learning: DQN, PPO, and SAC with Battery-Constrained Reward

This article has 1 author:
1. Sayeed Omar
This article has no evaluationsLatest version Apr 16, 2026
Enabling Safe UAV Navigation in Transparent and Specular Environments via Generative Depth Completion

This article has 7 authors:
1. Boyu Zhou
2. Pengcheng Zhu
3. Xulin Xiao
4. Hao Hu
5. Wei Pan
6. Huaxu Li
7. Qingkai Yang
This article has no evaluationsLatest version Apr 15, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Trust Guided Reinforcement Learning for Safe Robot Navigation with Dynamic Window Approach

Energy-Aware Autonomous UAV Navigation via Deep Reinforcement Learning: DQN, PPO, and SAC with Battery-Constrained Reward

Enabling Safe UAV Navigation in Transparent and Specular Environments via Generative Depth Completion