The pages below contain teaching material from Olivier Sigaud’s reinforcement learning (RL) lectures. They are designed to help you learn RL on your own. Each page contains at least slides, and most often lessons in youtube videos and notebooks corresponding to labs. There might also be additional material to go beyond the basics.
The order of the lessons below matters, if you want to understand a lot about RL and you don’t have much prior knowledge, you are encouraged to work on these lessons sequentially. Getting a good grasp on RL from the “central flow of lessons” below may take something like 10 days of work to a motivated beginner.
The labs are based on the bbrl library, before studying DQN you should have a look at the bbrl documentation and the bbrl introductory notebooks.
These pages are subject to perpetual improvement, if you have any question or any suggestion to improve the content, send a message to Olivier.Sigaud@sorbonne-universite.fr.
Overview: the 5 routes to Deep RL
Tabular model-free reinforcement learning
Reliable evaluation, stats and hyper-parameter tuning
Tabular model-based reinforcement learning
Deep Q-Network (DQN) and its successors
Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3)
Trust Region Policy Optimization (TRPO) and ACKTR
Proximal Policy Optimization (PPO)
High UTD ratio algorithms (TQC, DroQ)
The labs are based on the bbrl library, before studying DQN you should have a look at the bbrl documentation and the bbrl introductory notebooks.
Direct policy search and RL: introduction
Direct policy search and RL: comparisons
Direct policy search and RL: combinations
GCRL: hindsight experience replay