Skip to content

Conversation

pjain03
Copy link

@pjain03 pjain03 commented Aug 27, 2025

Description

In ppo.py (and ppo_continuous_action.py), losses/approx_kl and losses/old_approx_kl reflect the last minibatch of the final epoch in each update. This proposes logging simple aggregates over the PPO update phase:

  • losses/approx_kl_mean (mean over all minibatches/epochs in the update)
  • losses/approx_kl_max (max over all minibatches/epochs)
  • losses/old_approx_kl_mean
  • losses/old_approx_kl_max
    This is non-performance-impacting, small quality-of-life improvement: these aggregates can make KL trends easier to read in noisier settings. No behavior changes, logging only.

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the tests accordingly (if applicable).
  • I have updated the documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

  • I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
  • I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
  • I have performed RLops with python -m openrlbenchmark.rlops.
    • For new feature or bug fix:
      • I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
    • For new algorithm:
      • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
    • I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

Copy link

vercel bot commented Aug 27, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
cleanrl Ready Ready Preview Comment Aug 27, 2025 1:28am

@pjain03 pjain03 changed the title Refs #522 [Enhancement] Log mean/max KL across PPO update phase #522 Aug 27, 2025
@pjain03 pjain03 changed the title [Enhancement] Log mean/max KL across PPO update phase #522 [Enhancement] Log mean/max KL across PPO update phase Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant