Skip to content

Conversation

maxreciprocate
Copy link
Collaborator

@maxreciprocate maxreciprocate commented Nov 28, 2022

This pr extends randomwalks example for ppo. Also it changes its reward to be measured in proportion against shortest lengths, which doesn't change much empirically since this is a very easy example, but I find it more interpretable then raw negative lengths.

ILQL: https://wandb.ai/sorry/public/runs/1rl3ls2d
PPO: https://wandb.ai/sorry/public/runs/2vviyvwg

@dpaleka
Copy link

dpaleka commented Nov 29, 2022

It might be better to host this model https://huggingface.co/randomwalks under https://huggingface.co/CarperAI or something similar, if the randomwalks repo is the official test example for experimental features as written in CONTRIBUTING.md.

@dpaleka dpaleka mentioned this pull request Nov 29, 2022
@maxreciprocate
Copy link
Collaborator Author

@jon-tow What do you think about Daniel's comment? Also just for the record I changed setup.cfg to let the top import in randomwalks be both visible from the root and when importing from trlx/sweep.py

@maxreciprocate maxreciprocate marked this pull request as ready for review November 29, 2022 22:50
Copy link
Collaborator

@jon-tow jon-tow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I ran everything locally and it worked great 👍 I left some nits if you could address them before merging.

@jon-tow
Copy link
Collaborator

jon-tow commented Nov 29, 2022

@jon-tow What do you think about Daniel's comment? Also just for the record I changed setup.cfg to let the top import in randomwalks be both visible from the root and when importing from trlx/sweep.py

@reciprocated I think it might be easier to find under CarperAI but ultimately it's up to you given that it's your model 😄 If you want to transfer; feel free!

@LouisCastricato
Copy link
Contributor

Host under Carper

@maxreciprocate
Copy link
Collaborator Author

@jon-tow Moved configs and moved the model under carper. Also I added networkx as dependency, since it's pretty lightweight and follows a distinct naming pattern 🙂

ppo run from fresh git clone: https://wandb.ai/sorry/trlx/runs/2kh7e8q6

Copy link
Collaborator

@jon-tow jon-tow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! 👏

@maxreciprocate maxreciprocate merged commit a94eefc into main Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants