RL: thoughts, content for self-teaching and more

Learning Reinforcement Learning on your own

Outlook

The pages below contain teaching material from Olivier Sigaud’s reinforcement learning (RL) lectures. They are designed to help you learn RL on your own. Each page contains at least slides, and most often lessons in youtube videos and notebooks corresponding to labs. There might also be additional material to go beyond the basics.

The order of the lessons below matters, if you want to understand a lot about RL and you don’t have much prior knowledge, you are encouraged to work on these lessons sequentially. Getting a good grasp on RL from the “central flow of lessons” below may take something like 10 days of work to a motivated beginner.

The labs are based on the bbrl library, before studying DQN you should have a look at the bbrl documentation and the bbrl introductory notebooks.

These pages are subject to perpetual improvement, if you have any question or any suggestion to improve the content, send a message to Olivier.Sigaud@sorbonne-universite.fr.

Reinforcement Learning: Central flow of lessons

Overview: the 5 routes to Deep RL


Tabular reinforcement learning

Tabular dynamic programming

Tabular model-free reinforcement learning

Reliable evaluation, stats and hyper-parameter tuning

Tabular model-based reinforcement learning


Deep (model-free) reinforcement learning

Deep Q-Network (DQN) and its successors

Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3)

On-policy vs Off-policy

Policy Gradient approaches

Bias vs Variance

Advantage Actor-Critic (A2C)

Trust Region Policy Optimization (TRPO) and ACKTR

Proximal Policy Optimization (PPO)

Soft Actor-Critic (SAC)

High UTD ratio algorithms (TQC, DroQ)

Deel Model-free RL Wrap-Up

The labs are based on the bbrl library, before studying DQN you should have a look at the bbrl documentation and the bbrl introductory notebooks.


Direct policy search and RL

Direct policy search and RL: introduction

Direct policy search methods

Policy gradient details

Direct policy search and RL: comparisons

Direct policy search and RL: combinations

Population-based training

TD-MPC


Goal-conditioned reinforcement learning

GCRL: introduction

GCRL: core concepts

GCRL: typology

GCRL: skill learners

GCRL: hindsight experience replay

GCRL: goal reachers


Advanced RL

RLPD