Fix ppo ratio inaccuracy #108

maxreciprocate · 2022-11-22T20:12:50Z

This pr fixes #107

lets models in orchestrator use attention_mask for computing old_logprobs

but also

removes entity_name from ppo_config (which I suspect wasn't intentional)
enables more thorough logging, synced with orchestrator's logging with iter_count
adds git commit as tag to wandb runs

https://wandb.ai/sorry/public/reports/Ratio-fix--VmlldzozMDE4NzI2

Dahoas · 2022-11-22T20:19:51Z

Perhaps add the ratio == 1 check in a unittest?

jon-tow

Looks great @reciprocated 👏

configs/ppo_config.yml

maxreciprocate added 3 commits November 22, 2022 21:36

revert(config): entity_name option

98ff43e

feat(logs): more thorough logging & wandb commit tag

58c2535

fix(ppo): respect attention_mask for old_logprobs

ffd0693

maxreciprocate requested a review from Dahoas November 22, 2022 20:13

jon-tow self-requested a review November 22, 2022 20:23

jon-tow approved these changes Nov 22, 2022

View reviewed changes

configs/ppo_config.yml Show resolved Hide resolved

jon-tow merged commit 8edcd9d into main Nov 22, 2022

maxreciprocate deleted the 107-fix-ppo-ratio branch November 22, 2022 21:31

maxreciprocate mentioned this pull request Nov 28, 2022

Move queries on the device #115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ppo ratio inaccuracy #108

Fix ppo ratio inaccuracy #108

Uh oh!

maxreciprocate commented Nov 22, 2022

Uh oh!

Dahoas commented Nov 22, 2022

Uh oh!

jon-tow left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Fix ppo ratio inaccuracy #108

Fix ppo ratio inaccuracy #108

Uh oh!

Conversation

maxreciprocate commented Nov 22, 2022

Uh oh!

Dahoas commented Nov 22, 2022

Uh oh!

jon-tow left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jon-tow left a comment •

edited

Loading