Skip to content

Conversation

maxreciprocate
Copy link
Collaborator

This pr fixes #107

  • lets models in orchestrator use attention_mask for computing old_logprobs

but also

  • removes entity_name from ppo_config (which I suspect wasn't intentional)
  • enables more thorough logging, synced with orchestrator's logging with iter_count
  • adds git commit as tag to wandb runs

https://wandb.ai/sorry/public/reports/Ratio-fix--VmlldzozMDE4NzI2

@Dahoas
Copy link
Collaborator

Dahoas commented Nov 22, 2022

Perhaps add the ratio == 1 check in a unittest?

@jon-tow jon-tow self-requested a review November 22, 2022 20:23
Copy link
Collaborator

@jon-tow jon-tow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @reciprocated 👏

@jon-tow jon-tow merged commit 8edcd9d into main Nov 22, 2022
@maxreciprocate maxreciprocate deleted the 107-fix-ppo-ratio branch November 22, 2022 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ratio != 1 at start of PPO training (during loss function calculation)
3 participants