Note about the videos: The slides are more recent. The videos note $r_t$ the reward resulting from applying action $a_t$ in state $s_t$, in the slides I switched to noting it $r_{t+1}$, which makes more sense.
https://master-dac.isir.upmc.fr/rld/rl/01-dynamic_programming.student.ipynb