RL: thoughts, content for self-teaching and more

Tabular Reinforcement Learning

Slides

Temporal Difference mechanisms

Q-learning, SARSA and Tabular actor-critic

Videos

Model-Free RL (25’)

Note about the video: The slides are more recent. The video notes $r_t$ the reward resulting from applying action $a_t$ in state $s_t$, in the slides I switched to noting it $r_{t+1}$, which makes more sense.

Labs

https://master-dac.isir.upmc.fr/rld/rl/02-tabular-rl.student.ipynb

Additional material