Skip to content

[RL-reinforce]: learningrate exp #1 #37

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

jaimepedretp
Copy link
Collaborator

Experiments on reinforce algorithm, trying to figure out why entropy is collapsing and policy converges too quick to a actions with prob = 1.

Here we are changing learning rate and observing behavior on entropy, loss, and running reward.

After 30.8k steps episodes, model managed a maximum Running Reward of ~590 @27k episode. We achieved best performance are using LR of 1e-05.
Explanation and conclusions below.

tested learning rates

05e-04 orange / 01e-04 seablue / 01e-05 green / 01e-06 pink / 01e-08 skyblue
image
image

Note:

  • lower learning rates than 01e-05 make policy to collapse and entropy goes to 0.
  • entropy on model with lr=05e-04 goes to 0 immediately after start of training.

From that starting point, we decided to extend training time using 01e-05 lr. Results below
image

image

At this point, we can sometimes get rewards up to 810. Agent is even capable of recovering from massive donuts burnout.

openaigym.video.0.32231.video000000.mp4

Reinforce may be able to solve CarRacing environment but still has many weak points:

  • We need to use small learning rate to avoid policy being stuck and not learning.

  • As a result of using small learning rate, training is very slow.

  • Not sure yet if we will reach our target using reinforce algorithm (Running reward = 900)

Other approaches and extensions of basic reinforce should show better results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant