-
Notifications
You must be signed in to change notification settings - Fork 482
Add ppo randomwalks example #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It might be better to host this model https://huggingface.co/randomwalks under https://huggingface.co/CarperAI or something similar, if the randomwalks repo is the official test example for experimental features as written in CONTRIBUTING.md. |
@jon-tow What do you think about Daniel's comment? Also just for the record I changed setup.cfg to let the top import in randomwalks be both visible from the root and when importing from trlx/sweep.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job! I ran everything locally and it worked great 👍 I left some nits if you could address them before merging.
@reciprocated I think it might be easier to find under CarperAI but ultimately it's up to you given that it's your model 😄 If you want to transfer; feel free! |
Host under Carper |
@jon-tow Moved configs and moved the model under carper. Also I added networkx as dependency, since it's pretty lightweight and follows a distinct naming pattern 🙂 ppo run from fresh git clone: https://wandb.ai/sorry/trlx/runs/2kh7e8q6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! 👏
This pr extends randomwalks example for ppo. Also it changes its reward to be measured in proportion against shortest lengths, which doesn't change much empirically since this is a very easy example, but I find it more interpretable then raw negative lengths.
ILQL: https://wandb.ai/sorry/public/runs/1rl3ls2d
PPO: https://wandb.ai/sorry/public/runs/2vviyvwg