when adding samples, I modified the reference code to exclude
terminating status samples in hope that this would less complicate the batch creation process.

the original code:

self.samplestorage.add_sample(state, action, reward, next_state)

the modified code:

if done:
pass
else:
self.samplestorage.add_sample(state, action, reward, next_state)

However, this small change made a huge difference in training convergence.

The modification failed to ever get the total reward for each episode to increase.

The original code was able to learn something and increase its total reward per episode after a few episode runs.

So the tip here is to always include the terminating sample even though the termination does not mean accomplishing the ultimate objective.

Categories: deep learning