Reinforcement Learning: Tutorial and Survey

Benyamin Ghojogh
Ali Ghodsi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This is a tutorial and survey paper on reinforcement learning, from fundamental reinforcement learning to deep reinforcement learning. It starts with introducing the elements of reinforcement learning. Then, Markov decision process and policy are explained. Bellman equation is introduced. Then, value iteration, policy iteration, and modified policy iteration are introduced for solving Markov decision process. Then, difference of reinforcement learning and Markov decision process is mentioned followed by temporal difference evaluation. Then, Q function, Q-learning, epsilon-greedy policy, gradient Q-learning, experience replay, and deep Q network are covered. Afterwards, policy gradient and the REINFORCE algorithm are explained. Finally, the details of AlphaGo -- as one of the successful applications of reinforcement learning -- are introduced.

Version published to 10.31219/osf.io/s98ex on OSF Preprints
Jul 22, 2024

Dynamic Feature Engineering Through Reinforcement and Prompt Based Learning

This article has 1 author:
1. Tanmay Karthik
This article has no evaluationsLatest version May 28, 2025
Reinforcement Learning-Based Optimization Strategy for Online Advertising Budget Allocation

This article has 4 authors:
1. Mengfei Yang
2. Qiong Cao
3. Lingyun Tong
4. Jiawen Shi
This article has no evaluationsLatest version May 28, 2025
Fictive Learning in Model-based Reinforcement Learning by Generalized Reward Prediction Errors

This article has 3 authors:
1. Jianning Chen
2. Masakazu Taira
3. Kenji Doya
This article has no evaluationsLatest version Jun 15, 2025

Listed in

Abstract

Article activity feed

Related articles

Dynamic Feature Engineering Through Reinforcement and Prompt Based Learning

Reinforcement Learning-Based Optimization Strategy for Online Advertising Budget Allocation

Fictive Learning in Model-based Reinforcement Learning by Generalized Reward Prediction Errors