RL: thoughts, content for self-teaching and more

Tabular Dynamic Programming

Slides

General introduction to RL

Markov Decision Processes

Dynamic Programming

Videos

Introduction (13’)

MDPs (14’)

Dynamic Programming (19’)

Note about the videos: The slides are more recent. The videos note $r_t$ the reward resulting from applying action $a_t$ in state $s_t$, in the slides I switched to noting it $r_{t+1}$, which makes more sense.

Labs

https://master-dac.isir.upmc.fr/rld/rl/01-dynamic_programming.student.ipynb

Additional material

Convergence proofs for value iteration and policy iteration (borrowed from Sylvain Lamprier’s class, in french)