[RL-reinforce]: learningrate exp #1 #37

jaimepedretp · 2021-03-29T18:26:18Z

Experiments on reinforce algorithm, trying to figure out why entropy is collapsing and policy converges too quick to a actions with prob = 1.

Here we are changing learning rate and observing behavior on entropy, loss, and running reward.

After 30.8k steps episodes, model managed a maximum Running Reward of ~590 @27k episode. We achieved best performance are using LR of 1e-05.
Explanation and conclusions below.

tested learning rates

05e-04 orange / 01e-04 seablue / 01e-05 green / 01e-06 pink / 01e-08 skyblue

Note:

lower learning rates than 01e-05 make policy to collapse and entropy goes to 0.
entropy on model with lr=05e-04 goes to 0 immediately after start of training.

From that starting point, we decided to extend training time using 01e-05 lr. Results below

At this point, we can sometimes get rewards up to 810. Agent is even capable of recovering from massive donuts burnout.

openaigym.video.0.32231.video000000.mp4

Reinforce may be able to solve CarRacing environment but still has many weak points:

We need to use small learning rate to avoid policy being stuck and not learning.
As a result of using small learning rate, training is very slow.
Not sure yet if we will reach our target using reinforce algorithm (Running reward = 900)

Other approaches and extensions of basic reinforce should show better results.

jaimepedretp added 7 commits March 27, 2021 19:54

reinforce experiments with different learning rates

dca5ec5

Adding runs up to 30.8k episodes

5d9bcd6

Runs up to 30.8k - deleted old runs

80d9ea0

video with 810 reward - nice burnout save

993ee5a

runs up to 67.8k episodes / best=~800 final=~300

bcb5363

runs up to 67.8k episodes / best=~800 final=~300

383ad40

runs up to 67.8k episodes / best=~800 final=~300. Added video eval mode

73f9788

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RL-reinforce]: learningrate exp #1 #37

[RL-reinforce]: learningrate exp #1 #37

Uh oh!

jaimepedretp commented Mar 29, 2021

Uh oh!

Uh oh!

[RL-reinforce]: learningrate exp #1 #37

Are you sure you want to change the base?

[RL-reinforce]: learningrate exp #1 #37

Uh oh!

Conversation

jaimepedretp commented Mar 29, 2021

tested learning rates

Uh oh!

Uh oh!