RL: thoughts, content for self-teaching and more

Direct policy search and reinforcement learning

TD-MPC

Slides

Warning, the video says that the Prioritized Experience Replay implementation ranks the sample based on the value, which would make no sense. Actually, this is wrong, it ranks the samples according to the value loss, that is the temporal difference error, which is correct. This has been fixed in the slides.

TD-MPC

Video

TD-MPC (19’)