A Brief Tutorial on Reinforcement Learning: From MDP to DDPG

Tian Zhang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This tutorial presents a coherent overview of reinforcement learning (RL), tracing its evolution from theoretical foundations to advanced deep learning algorithms. We begin with the mathematical formalization of sequential decision-making via Markov Decision Processes (MDPs). Central to RL theory is the Bellman equation for policy evaluation and its extension, the Bellman optimality equation, which provides the fundamental condition for optimal behavior. The journey from these equations to practical algorithms is explored, starting with model-based dynamic programming and progressing to model-free temporal-difference learning. We highlight Q-learning as a pivotal model-free algorithm that directly implements the Bellman optimality equation through sampling. To handle high-dimensional state spaces, the paradigm shifts to function approximation and deep reinforcement learning, exemplified by Deep Q-Networks (DQN). A significant challenge arises in continuous action spaces, addressed by actor-critic methods. We examine the Deep Deterministic Policy Gradient (DDPG) algorithm in detail, explaining how it adapts the principles of optimality to continuous control by maintaining separate actor and critic networks. The tutorial concludes with a unified perspective, framing RL's development as a logical progression from defining optimality conditions to developing scalable solution algorithms, and briefly surveys subsequent improvements and future directions, all underpinned by the enduring framework of the Bellman equations.

Version published to 10.20944/preprints202601.0420.v1
Jan 6, 2026

A Brief Survey of Deep Reinforcement Learning Algorithms for Autonomous Systems

This article has 7 authors:
1. Maxwell Khan
2. Jackson Reynolds
3. Madison Taylor
4. Caleb Walker
5. Savannah Mitchell
6. Ethan Carter
7. Emma Davis
This article has no evaluationsLatest version Jan 22, 2026
BeamCraft: Deep Reinforcement Learning-DrivenMulti-Objective Beamforming for ISAC

This article has 2 authors:
1. Duc Nguyen Dao
2. Yang Miao
This article has no evaluationsLatest version Feb 3, 2026
Learning Contraction Metrics for Provably Stable Model-Based Reinforcement Learning

This article has 1 author:
1. Amir Hameed Mir
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Brief Survey of Deep Reinforcement Learning Algorithms for Autonomous Systems

BeamCraft: Deep Reinforcement Learning-DrivenMulti-Objective Beamforming for ISAC

Learning Contraction Metrics for Provably Stable Model-Based Reinforcement Learning