[Enhancement] Log mean/max KL across PPO update phase #523

pjain03 · 2025-08-27T01:27:50Z

Description

In ppo.py (and ppo_continuous_action.py), losses/approx_kl and losses/old_approx_kl reflect the last minibatch of the final epoch in each update. This proposes logging simple aggregates over the PPO update phase:

losses/approx_kl_mean (mean over all minibatches/epochs in the update)
losses/approx_kl_max (max over all minibatches/epochs)
losses/old_approx_kl_mean
losses/old_approx_kl_max
This is non-performance-impacting, small quality-of-life improvement: these aggregates can make KL trends easier to read in noisier settings. No behavior changes, logging only.

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the tests accordingly (if applicable).
I have updated the documentation and previewed the changes via mkdocs serve.
- I have explained note-worthy implementation details.
- I have explained the logged metrics.
- I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
I have performed RLops with python -m openrlbenchmark.rlops.
- For new feature or bug fix:
  - I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
  - I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
- I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

vercel · 2025-08-27T01:27:54Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
cleanrl	Ready	Preview	Comment	Aug 27, 2025 1:28am

Refs vwxyzjn#522

a7369db

vercel bot deployed to Preview August 27, 2025 01:28 View deployment

pjain03 changed the title ~~Refs #522~~ [Enhancement] Log mean/max KL across PPO update phase #522 Aug 27, 2025

pjain03 changed the title ~~[Enhancement] Log mean/max KL across PPO update phase #522~~ [Enhancement] Log mean/max KL across PPO update phase Aug 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] Log mean/max KL across PPO update phase #523

[Enhancement] Log mean/max KL across PPO update phase #523

Uh oh!

pjain03 commented Aug 27, 2025 •

edited

Loading

Uh oh!

vercel bot commented Aug 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

[Enhancement] Log mean/max KL across PPO update phase #523

Are you sure you want to change the base?

[Enhancement] Log mean/max KL across PPO update phase #523

Uh oh!

Conversation

pjain03 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Types of changes

Checklist:

Uh oh!

vercel bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pjain03 commented Aug 27, 2025 •

edited

Loading

vercel bot commented Aug 27, 2025 •

edited

Loading