Skip to content

Conversation

sdpkjc
Copy link
Collaborator

@sdpkjc sdpkjc commented Mar 2, 2025

Description

Upgrade gymnasium to 1.0.0

  • gymnasium classic control

    • c51.py
    • c51_jax.py
    • dqn.py
    • dqn_jax.py
    • ppo.py
    • pqn.py
  • gymnasium mujoco

    • ddpg_continuous_action.py
    • ddpg_continuous_action_jax.py
    • td3_continuous_action.py
    • td3_continuous_action_jax.py
    • sac_continuous_action.py
    • ppo_continuous_action.py
    • rpo_continuous_action.py
  • gymnasium atari (EpisodicLifeEnv conflicts with gymnasium v1.0.0's RecordEpisodeStatistics and will be fixed later.)

    • c51_atari.py
    • c51_atari_jax.py
    • dqn_atari.py
    • dqn_atari_jax.py
    • qdagger_dqn_atari_impalacnn.py
    • qdagger_dqn_atari_jax_impalacnn.py
    • sac_atari.py
    • ppo_atari.py
    • ppo_atari_lstm.py
    • ppo_atari_multigpu.py
  • envpool

    • ppo_rnd_envpool.py
    • pqn_atari_envpool_lstm.py
    • pqn_atari_envpool.py
    • ppo_atari_envpool.py
    • ppo_atari_envpool_xla_jax.py
    • ppo_atari_envpool_xla_jax_scan.py
  • other

    • ppg_procgen.py
    • ppo_pettingzoo_ma_atari.py
    • ppo_procgen.py
    • ppo_trxl.py
    • ppo_continuous_action_isaacgym.py

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the tests accordingly (if applicable).
  • I have updated the documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

  • I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
  • I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
  • I have performed RLops with python -m openrlbenchmark.rlops.
    • For new feature or bug fix:
      • I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
    • For new algorithm:
      • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
    • I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

Copy link

vercel bot commented Mar 2, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
cleanrl ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 4, 2025 9:24am

Copy link
Collaborator

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this @sdpkjc

Ale-py should be updated to v0.10.1

And the auto reset mode of the vector environment should be updated

pyproject.toml Outdated
stable-baselines3 = "2.0.0"
gymnasium = ">=0.28.1"
stable-baselines3 = ">=2.4.0"
gymnasium = ">=1.0.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be specified as v1.1.0, if sb3 is the limitation then I think see if I can update it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sb3 depends on <=1.0.0.

@pseudo-rnd-thoughts
Copy link
Collaborator

pseudo-rnd-thoughts commented Mar 3, 2025

If I remember correctly, SB3 is used for the replay buffer and the atari wrappers. IMO, those features can probably be shifted in cleanrl_utils directly however this should happen in a separate later PR

@sdpkjc sdpkjc changed the title Upgrade gymnasium to 1.1.0 Upgrade gymnasium to 1.0.0 Mar 4, 2025


# Only for gymnasium v1.0.0
class SameModelSyncVectorEnv(gym.vector.SyncVectorEnv):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be called SameStepModeSyncVectorEnv or we just shift to gymnasium v1.1.0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


[tool.poetry.dependencies]
python = ">=3.8,<3.11"
python = ">=3.9,<3.11"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is limiting us increasing this?

torch = ">=1.12.1"
stable-baselines3 = "2.0.0"
gymnasium = ">=0.28.1"
stable-baselines3 = "^2.4.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we will have a new release of sb3 with support for gymnasium v1.1.0 as no changes seem to be required on their end (DLR-RM/stable-baselines3#2095)

@ghost
Copy link

ghost commented Mar 20, 2025

Hey, with the updates in gymnasium 1.1 would it not be easier to simply use the 'Same-Step Mode' or am I missing something?

Does it have to do with the support for the other wrappers that are only supported in the 'Next step' mode?

@pseudo-rnd-thoughts
Copy link
Collaborator

@MarcusBinderDTU it is more about minimising implementation changes.
The old implementation used same step, therefore, as this is a module update, we are trying to minimise code changes.
Moving to next step (which for the PPO implementations could be beneficial) would be a separate or

@ghost
Copy link

ghost commented Mar 20, 2025

@MarcusBinderDTU it is more about minimising implementation changes. The old implementation used same step, therefore, as this is a module update, we are trying to minimise code changes. Moving to next step (which for the PPO implementations could be beneficial) would be a separate or

Thanks for the fast reply!

I agree, but I dont understand why not going directly to gymnasium 1.1 and then using autoreset_mode=gym.vector.AutoresetMode.SAME_STEP for the environment creation?

Would that not be the easiest way of doing it?

@sdpkjc
Copy link
Collaborator Author

sdpkjc commented Mar 20, 2025

Thanks for your suggestion. However, since the currently released version of sb3 depends on gymnasium < 1.1, we can’t upgrade to 1.1 directly. Once #505 is merged, we'll remove the sb3 dependency and then update to gymnasium 1.1, which will allow us to use autoreset_mode=gym.vector.AutoresetMode.SAME_STEP for environment creation. So this PR is on hold until #505 goes in.

@ghost
Copy link

ghost commented Mar 20, 2025

Ahh, now I see! Thanks for clarifying it, that makes sense :)

Copy link

@jugheadjones10 jugheadjones10 May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdpkjc
For this line:
real_next_obs[idx] = infos["final_observation"][idx]
I think "final_observation" should be "final_obs".
Tried running and "final_observation" gives a key error but not "final_obs".

@jugheadjones10
Copy link

@sdpkjc I noticed that the scripts in cleanrl_utils/evals also still use the old way of getting episodic returns:

if "final_info" in infos:
    for info in infos["final_info"]:
        if "episode" not in info:

So we might need to update all the files in this directory too!

@varadVaidya
Copy link

Do you think its a good idea to let the users know that the logging for gymnasium >= 1.0 is different until the PR is merged? I did face some puzzling errors, and took more time than it should have, since the change in reward logging is not mentioned anywhere prominently in the changelogs for gymnasium.

@pseudo-rnd-thoughts pseudo-rnd-thoughts mentioned this pull request Jul 4, 2025
18 tasks
@pseudo-rnd-thoughts
Copy link
Collaborator

Closing in favour of #516

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants