-
-
Notifications
You must be signed in to change notification settings - Fork 464
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I’m using the gin_rummy_v4 environment from PettingZoo’s classic set (which wraps the RLCard implementation), and I’ve discovered two related issues:
- The seed parameter (or overall seeding behavior) of the environment does not seem to follow the usual env.reset(seed=…) semantics in PettingZoo. In particular, the underlying RLCard game seems to be used via rlcard.make(…) and thus the seed applied in PettingZoo does not get consistently propagated to the RLCard backend.
- Because of (1) the payoff/reward defaults appear inconsistent with the documented defaults in PettingZoo. The documentation says for the Gin Rummy environment (wrapped by RLCard) that the default knock reward is 0.5, but the underlying RLCard defaults use 0.2 as the knock reward.  Hence I’m seeing “knock amount = 0.2” instead of 0.5 when I use default settings or even custom setting.
- These two issues together (seed not respected + inconsistent reward default) cause reproducibility problems and unexpected payoff behavior when using the environment for research or benchmarking.
Code example
from pettingzoo.classic import gin_rummy_v4
env = gin_rummy_v4.env()
env.reset(seed=seed)System info
- PettingZoo version: (pettingzoo[classic]>=1.24.3)
- RLCard version: (ray[rllib]>=2.31.0)
- Python version: (e.g., 3.11.14)
- OS: (MacOS 26, Linux 24.04)
Additional context
Suggested Fix / Ask:
- Ensure that in the gin_rummy wrapper (and any other RLCard-wrapped environment) the reset(seed=…) value is passed properly to RLCard’s make or internal RNG so that the seed is effective and reproducible.
- Force the default knock_reward and gin_reward to the documented values (0.5 and 1.0 respectively) regardless of RLCard’s internal defaults – or at minimum document clearly when RLCard defaults differ.
- Add automated tests to verify that the seed parameter produces deterministic results (e.g., fixed draws, shuffle order) and that reward defaults match documentation.
- Optionally, update the documentation to more clearly state that because RLCard is used under the hood, some internal defaults (like 0.2) may still apply unless overridden, or update the wrapper to guarantee the override.
Checklist
- I have checked that there is no similar issue in the repo
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working