-
Notifications
You must be signed in to change notification settings - Fork 865
SAC Documentation - Benchmarks - Minor code tweaks #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/vwxyzjn/cleanrl/rnPXKTft8t6hLfegqbnNkwBpG7W8 |
… SAC with fixed $\alpha$
Hello there. While I think this branch should be ready for review, it could not pass the pre-commit test due to some problem with the isort....................................................................Passed
autoflake................................................................Passed
black....................................................................Failed
- hook id: black
- exit code: 1
Traceback (most recent call last):
File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/bin/black", line 8, in <module>
sys.exit(patched_main())
File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1423, in patched_main
patch_click()
File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1409, in patch_click
from click import _unicodefun
ImportError: cannot import name '_unicodefun' from 'click' (/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/click/__init__.py)
codespell................................................................Passed |
Fixed with psf/black#2964 |
Thank you @dosssman. This is a really high-quality benchmark. I have asked @ikostrikov, who maintains https://github.com/ikostrikov/jaxrl, to help review this PR. Thanks @ikostrikov! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other thing. Would you mind customizing the chart a bit like the other charts in the benchmark? Use CleanRL's sac_continuous_action.py
instead of exp_name: sac_continuous_action
for the legend. I'd also change the line color to red for consistency.
Everything else looks good :) Feel free to merge once you have a chance to address these .
docs/rl-algorithms/sac.md
Outdated
|
||
## Overview | ||
|
||
The Soft Actor-Critic (SAC) algorithm extends the DDPG algorithms by 1) using a stochastic policy, which in theory can express multi-modal optimal policies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DDPG algorithms > DDPG algorithm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
…lot color changes -- mentions global gradient clipping
|
… added VAE paper citation
Oh man, I finally found the last reference for SAC I was looking for but could not remember: pranz24/pytorch-soft-actor-critic . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks really good now. Thank you @dosssman and feel free to merge.
Thanks for the review. Merging then. |
Description
qf_loss
computation: removed the/2
gradient scaling so that .backward() is more aligned with the theory. Instead, logqf_loss
= (qf1_loss + qf2_loss) / 2.` for meaningful comparison with mono Q-value network algorithms.Same is done in OpenAI SpinningUP for example.
Types of changes
Checklist:
pre-commit run --all-files
passes (required).mkdocs serve
.I have updated the tests accordingly (if applicable).If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.
--capture-video
flag toggled on (required).mkdocs serve
.width=500
andheight=300
).I have updated the tests accordingly (if applicable).