Reward shaping not removed in test.py

Hi,

I notice that you've used some reward shaping during training, but forgot to remove it in in calculating the running average and in `test.py`, and that leads to incorrect results.
Commenting out relevant lines, I got an average score of around 860 over 100 episodes for the model in the repo.