Chadrick Blog

reinforcement learning on-policy vs off-policy

nice explanation here

second answer.