-
Notifications
You must be signed in to change notification settings - Fork 482
Implemented hydra heads + adaptive kl #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
model_type : "AcceleratePPOModel" # Name of accelerate model type to load | ||
device : "cuda" # Train device | ||
num_layers_unfrozen : -1 # Number of bottom layers to freeze during training | ||
num_layers_unfrozen : 2 # Number of bottom layers to freeze during training |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing this in the default config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment below
cliprange : 0.2 # clip range | ||
cliprange_value : 0.2 # clip range | ||
vf_coef : 0.2 # value term weight | ||
vf_coef : 2.3 # value term weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found these parameters work a lot better for quickly checking whether reward is increasing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. sounds good.
@@ -0,0 +1,52 @@ | |||
model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config would be great for a CI.
|
||
# Cell | ||
|
||
class ModelBranch(PreTrainedModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a high level overview comment of how this class works.
@@ -0,0 +1,52 @@ | |||
import unittest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this is a useful class but I think we should be handling unit in a separate PR....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it in this merge because it tests the ModelBranch implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. It needs a lot of work. lets chat later.
Implemented BranchModel class to support multi-headed hydra type models. Also added adaptive kl controller.
Achieves 4x speedup for training on GPT2-mediuma and 10x speedup for training on GPTj and halves memory footprint.
I also added unittests.