Temporal Difference mechanisms
Q-learning, SARSA and Tabular actor-critic
Note about the video: The slides are more recent. The video notes $r_t$ the reward resulting from applying action $a_t$ in state $s_t$, in the slides I switched to noting it $r_{t+1}$, which makes more sense.
https://master-dac.isir.upmc.fr/rld/rl/02-tabular-rl.student.ipynb