A Brief Tutorial on Reinforcement Learning: From MDP to DDPG

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This tutorial presents a coherent overview of reinforcement learning (RL), tracing its evolution from theoretical foundations to advanced deep learning algorithms. We begin with the mathematical formalization of sequential decision-making via Markov Decision Processes (MDPs). Central to RL theory is the Bellman equation for policy evaluation and its extension, the Bellman optimality equation, which provides the fundamental condition for optimal behavior. The journey from these equations to practical algorithms is explored, starting with model-based dynamic programming and progressing to model-free temporal-difference learning. We highlight Q-learning as a pivotal model-free algorithm that directly implements the Bellman optimality equation through sampling. To handle high-dimensional state spaces, the paradigm shifts to function approximation and deep reinforcement learning, exemplified by Deep Q-Networks (DQN). A significant challenge arises in continuous action spaces, addressed by actor-critic methods. We examine the Deep Deterministic Policy Gradient (DDPG) algorithm in detail, explaining how it adapts the principles of optimality to continuous control by maintaining separate actor and critic networks. The tutorial concludes with a unified perspective, framing RL's development as a logical progression from defining optimality conditions to developing scalable solution algorithms, and briefly surveys subsequent improvements and future directions, all underpinned by the enduring framework of the Bellman equations.

Article activity feed