Refactor PPO objective function #88

jon-tow · 2022-11-10T22:14:53Z

This PR decouples the PPO loss function from the AcceleratePPOModel class for general reuse following @cat-state 's work in #75 . The following updates are also provided:

Adds the unbiased KL-div estimates following http://joschu.net/blog/kl-approx.html
- Related Issue: PPO Implementation Details - Checklist #53
Removes repeated initialization of self.kl_ctl in accelerate_ppo_models constructor.
Removes utils.modeling.clip_by_value as this functionality was added to PyTorch nearly 2 years ago.
Introduces more PPO statistics to the logger for greater debugging information.

Test Run: https://wandb.ai/jon-tow/ppo-test

LouisCastricato · 2022-11-10T23:10:59Z

This looks great on my side, @Dahoas im open to merging this tonight if this is ok for you too?

cat-state

LGTM!

jon-tow added 3 commits November 10, 2022 20:10

[utils] Remove clip_by_value and add flatten_dict

12d621a

[refactor] Extract PPO loss from modeling class

e5a24c0

Revert pre-commit python version change

7dde9c1

jon-tow marked this pull request as ready for review November 10, 2022 22:24

cat-state approved these changes Nov 10, 2022

View reviewed changes

LouisCastricato merged commit 8a11ae0 into CarperAI:master Nov 10, 2022

jon-tow deleted the ppo-refactor branch November 14, 2022 05:16

cat-state mentioned this pull request Apr 18, 2023

The mean_kl implementation is different from that of openai/lm-human-preference #438

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor PPO objective function #88

Refactor PPO objective function #88

Uh oh!

jon-tow commented Nov 10, 2022 •

edited

Loading

Uh oh!

LouisCastricato commented Nov 10, 2022

Uh oh!

cat-state left a comment

Uh oh!

Uh oh!

Refactor PPO objective function #88

Refactor PPO objective function #88

Uh oh!

Conversation

jon-tow commented Nov 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LouisCastricato commented Nov 10, 2022

Uh oh!

cat-state left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jon-tow commented Nov 10, 2022 •

edited

Loading