mountaincar RL tips

when adding samples, I modified the reference code to exclude terminating status samples in hope that this would less complicate the batch creation process. the original code: the modified code: However, this small change made a huge difference in training convergence. The modification failed to ever get the total reward Read more…