Skip to content

Commit 9c97414

Browse files
Add new MuJoCo bindings
* update new mujoco bindings * optional ctc_force ant-v4 * force changes * contact force weight * add ctc force range * mujoco v3 skip test * doc Ant-v4 * inverted pendulum limit control
1 parent 8f9b62f commit 9c97414

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+3328
-1522
lines changed

.github/workflows/build.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ jobs:
1111
- uses: actions/checkout@v2
1212
- run: |
1313
docker build -f py.Dockerfile \
14-
--build-arg MUJOCO_KEY=$MUJOCO_KEY \
1514
--build-arg PYTHON_VERSION=${{ matrix.python-version }} \
1615
--tag gym-docker .
1716
- name: Run tests

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ repos:
99
hooks:
1010
- id: codespell
1111
args:
12-
- --ignore-words-list=nd,reacher,thist,ths
12+
- --ignore-words-list=nd,reacher,thist,ths, ure, referenc
1313
- repo: https://gitlab.com/PyCQA/flake8
1414
rev: 4.0.1
1515
hooks:

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,11 @@ env.close()
4646

4747
Gym keeps strict versioning for reproducibility reasons. All environments end in a suffix like "\_v0". When changes are made to environments that might impact learning results, the number is increased by one to prevent potential confusion.
4848

49+
## MuJoCo Environments
50+
51+
The latest "\_v4" and future versions of the MuJoCo environments will no longer depend on `mujoco-py`. Instead `mujoco` will be the required dependency for future gym MuJoCo environment versions. Old gym MuJoCo environment versions that depend on `mujoco-py` will still be kept but unmaintained.
52+
To install the dependencies for the latest gym MuJoCo environments use `pip install gym[mujoco]`. Dependencies for old MuJoCo environments can still be installed by `pip install gym[mujoco_py]`.
53+
4954
## Citation
5055

5156
A whitepaper from when Gym just came out is available https://arxiv.org/pdf/1606.01540, and can be cited with the following bibtex entry:

gym/envs/__init__.py

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,27 +162,55 @@
162162
reward_threshold=-3.75,
163163
)
164164

165+
register(
166+
id="Reacher-v4",
167+
entry_point="gym.envs.mujoco.reacher_v4:ReacherEnv",
168+
max_episode_steps=50,
169+
reward_threshold=-3.75,
170+
)
171+
165172
register(
166173
id="Pusher-v2",
167174
entry_point="gym.envs.mujoco:PusherEnv",
168175
max_episode_steps=100,
169176
reward_threshold=0.0,
170177
)
171178

179+
register(
180+
id="Pusher-v4",
181+
entry_point="gym.envs.mujoco.pusher_v4:PusherEnv",
182+
max_episode_steps=100,
183+
reward_threshold=0.0,
184+
)
185+
172186
register(
173187
id="InvertedPendulum-v2",
174188
entry_point="gym.envs.mujoco:InvertedPendulumEnv",
175189
max_episode_steps=1000,
176190
reward_threshold=950.0,
177191
)
178192

193+
register(
194+
id="InvertedPendulum-v4",
195+
entry_point="gym.envs.mujoco.inverted_pendulum_v4:InvertedPendulumEnv",
196+
max_episode_steps=1000,
197+
reward_threshold=950.0,
198+
)
199+
179200
register(
180201
id="InvertedDoublePendulum-v2",
181202
entry_point="gym.envs.mujoco:InvertedDoublePendulumEnv",
182203
max_episode_steps=1000,
183204
reward_threshold=9100.0,
184205
)
185206

207+
register(
208+
id="InvertedDoublePendulum-v4",
209+
entry_point="gym.envs.mujoco.inverted_double_pendulum_v4:InvertedDoublePendulumEnv",
210+
max_episode_steps=1000,
211+
reward_threshold=9100.0,
212+
)
213+
186214
register(
187215
id="HalfCheetah-v2",
188216
entry_point="gym.envs.mujoco:HalfCheetahEnv",
@@ -197,6 +225,13 @@
197225
reward_threshold=4800.0,
198226
)
199227

228+
register(
229+
id="HalfCheetah-v4",
230+
entry_point="gym.envs.mujoco.half_cheetah_v4:HalfCheetahEnv",
231+
max_episode_steps=1000,
232+
reward_threshold=4800.0,
233+
)
234+
200235
register(
201236
id="Hopper-v2",
202237
entry_point="gym.envs.mujoco:HopperEnv",
@@ -211,6 +246,13 @@
211246
reward_threshold=3800.0,
212247
)
213248

249+
register(
250+
id="Hopper-v4",
251+
entry_point="gym.envs.mujoco.hopper_v4:HopperEnv",
252+
max_episode_steps=1000,
253+
reward_threshold=3800.0,
254+
)
255+
214256
register(
215257
id="Swimmer-v2",
216258
entry_point="gym.envs.mujoco:SwimmerEnv",
@@ -225,6 +267,13 @@
225267
reward_threshold=360.0,
226268
)
227269

270+
register(
271+
id="Swimmer-v4",
272+
entry_point="gym.envs.mujoco.swimmer_v4:SwimmerEnv",
273+
max_episode_steps=1000,
274+
reward_threshold=360.0,
275+
)
276+
228277
register(
229278
id="Walker2d-v2",
230279
max_episode_steps=1000,
@@ -237,6 +286,12 @@
237286
entry_point="gym.envs.mujoco.walker2d_v3:Walker2dEnv",
238287
)
239288

289+
register(
290+
id="Walker2d-v4",
291+
max_episode_steps=1000,
292+
entry_point="gym.envs.mujoco.walker2d_v4:Walker2dEnv",
293+
)
294+
240295
register(
241296
id="Ant-v2",
242297
entry_point="gym.envs.mujoco:AntEnv",
@@ -251,6 +306,13 @@
251306
reward_threshold=6000.0,
252307
)
253308

309+
register(
310+
id="Ant-v4",
311+
entry_point="gym.envs.mujoco.ant_v4:AntEnv",
312+
max_episode_steps=1000,
313+
reward_threshold=6000.0,
314+
)
315+
254316
register(
255317
id="Humanoid-v2",
256318
entry_point="gym.envs.mujoco:HumanoidEnv",
@@ -263,8 +325,20 @@
263325
max_episode_steps=1000,
264326
)
265327

328+
register(
329+
id="Humanoid-v4",
330+
entry_point="gym.envs.mujoco.humanoid_v4:HumanoidEnv",
331+
max_episode_steps=1000,
332+
)
333+
266334
register(
267335
id="HumanoidStandup-v2",
268336
entry_point="gym.envs.mujoco:HumanoidStandupEnv",
269337
max_episode_steps=1000,
270338
)
339+
340+
register(
341+
id="HumanoidStandup-v4",
342+
entry_point="gym.envs.mujoco.humanoidstandup_v4:HumanoidStandupEnv",
343+
max_episode_steps=1000,
344+
)

gym/envs/mujoco/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
from gym.envs.mujoco.humanoidstandup import HumanoidStandupEnv
1010
from gym.envs.mujoco.inverted_double_pendulum import InvertedDoublePendulumEnv
1111
from gym.envs.mujoco.inverted_pendulum import InvertedPendulumEnv
12+
from gym.envs.mujoco.mujoco_rendering import RenderContextOffscreen, Viewer
1213
from gym.envs.mujoco.pusher import PusherEnv
1314
from gym.envs.mujoco.reacher import ReacherEnv
1415
from gym.envs.mujoco.swimmer import SwimmerEnv

gym/envs/mujoco/ant.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
88
def __init__(self):
9-
mujoco_env.MujocoEnv.__init__(self, "ant.xml", 5)
9+
mujoco_env.MujocoEnv.__init__(self, "ant.xml", 5, mujoco_bindings="mujoco_py")
1010
utils.EzPickle.__init__(self)
1111

1212
def step(self, a):

gym/envs/mujoco/ant_v3.py

Lines changed: 1 addition & 162 deletions
Original file line numberDiff line numberDiff line change
@@ -9,167 +9,6 @@
99

1010

1111
class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
12-
"""
13-
### Description
14-
15-
This environment is based on the environment introduced by Schulman,
16-
Moritz, Levine, Jordan and Abbeel in ["High-Dimensional Continuous Control
17-
Using Generalized Advantage Estimation"](https://arxiv.org/abs/1506.02438).
18-
The ant is a 3D robot consisting of one torso (free rotational body) with
19-
four legs attached to it with each leg having two links. The goal is to
20-
coordinate the four legs to move in the forward (right) direction by applying
21-
torques on the eight hinges connecting the two links of each leg and the torso
22-
(nine parts and eight hinges).
23-
24-
### Action Space
25-
The action space is a `Box(-1, 1, (8,), float32)`. An action represents the torques applied at the hinge joints.
26-
27-
| Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit |
28-
|-----|----------------------|---------------|----------------|---------------------------------------|-------|------|
29-
| 0 | Torque applied on the rotor between the torso and front left hip | -1 | 1 | hip_1 (front_left_leg) | hinge | torque (N m) |
30-
| 1 | Torque applied on the rotor between the front left two links | -1 | 1 | angle_1 (front_left_leg) | hinge | torque (N m) |
31-
| 2 | Torque applied on the rotor between the torso and front right hip | -1 | 1 | hip_2 (front_right_leg) | hinge | torque (N m) |
32-
| 3 | Torque applied on the rotor between the front right two links | -1 | 1 | angle_2 (front_right_leg) | hinge | torque (N m) |
33-
| 4 | Torque applied on the rotor between the torso and back left hip | -1 | 1 | hip_3 (back_leg) | hinge | torque (N m) |
34-
| 5 | Torque applied on the rotor between the back left two links | -1 | 1 | angle_3 (back_leg) | hinge | torque (N m) |
35-
| 6 | Torque applied on the rotor between the torso and back right hip | -1 | 1 | hip_4 (right_back_leg) | hinge | torque (N m) |
36-
| 7 | Torque applied on the rotor between the back right two links | -1 | 1 | angle_4 (right_back_leg) | hinge | torque (N m) |
37-
38-
### Observation Space
39-
40-
Observations consist of positional values of different body parts of the ant,
41-
followed by the velocities of those individual parts (their derivatives) with all
42-
the positions ordered before all the velocities.
43-
44-
By default, observations do not include the x- and y-coordinates of the ant's torso. These may
45-
be included by passing `exclude_current_positions_from_observation=False` during construction.
46-
In that case, the observation space will have 113 dimensions where the first two dimensions
47-
represent the x- and y- coordinates of the ant's torso.
48-
Regardless of whether `exclude_current_positions_from_observation` was set to true or false, the x- and y-coordinates
49-
of the torso will be returned in `info` with keys `"x_position"` and `"y_position"`, respectively.
50-
51-
However, by default, an observation is a `ndarray` with shape `(111,)`
52-
where the elements correspond to the following:
53-
54-
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit |
55-
|-----|-------------------------------------------------------------|----------------|-----------------|----------------------------------------|-------|------|
56-
| 0 | z-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) |
57-
| 1 | x-orientation of the torso (centre) | -Inf | Inf | torso | free | angle (rad) |
58-
| 2 | y-orientation of the torso (centre) | -Inf | Inf | torso | free | angle (rad) |
59-
| 3 | z-orientation of the torso (centre) | -Inf | Inf | torso | free | angle (rad) |
60-
| 4 | w-orientation of the torso (centre) | -Inf | Inf | torso | free | angle (rad) |
61-
| 5 | angle between torso and first link on front left | -Inf | Inf | hip_1 (front_left_leg) | hinge | angle (rad) |
62-
| 6 | angle between the two links on the front left | -Inf | Inf | ankle_1 (front_left_leg) | hinge | angle (rad) |
63-
| 7 | angle between torso and first link on front right | -Inf | Inf | hip_2 (front_right_leg) | hinge | angle (rad) |
64-
| 8 | angle between the two links on the front right | -Inf | Inf | ankle_2 (front_right_leg) | hinge | angle (rad) |
65-
| 9 | angle between torso and first link on back left | -Inf | Inf | hip_3 (back_leg) | hinge | angle (rad) |
66-
| 10 | angle between the two links on the back left | -Inf | Inf | ankle_3 (back_leg) | hinge | angle (rad) |
67-
| 11 | angle between torso and first link on back right | -Inf | Inf | hip_4 (right_back_leg) | hinge | angle (rad) |
68-
| 12 | angle between the two links on the back right | -Inf | Inf | ankle_4 (right_back_leg) | hinge | angle (rad) |
69-
| 13 | x-coordinate velocity of the torso | -Inf | Inf | torso | free | velocity (m/s) |
70-
| 14 | y-coordinate velocity of the torso | -Inf | Inf | torso | free | velocity (m/s) |
71-
| 15 | z-coordinate velocity of the torso | -Inf | Inf | torso | free | velocity (m/s) |
72-
| 16 | x-coordinate angular velocity of the torso | -Inf | Inf | torso | free | angular velocity (rad/s) |
73-
| 17 | y-coordinate angular velocity of the torso | -Inf | Inf | torso | free | angular velocity (rad/s) |
74-
| 18 | z-coordinate angular velocity of the torso | -Inf | Inf | torso | free | angular velocity (rad/s) |
75-
| 19 | angular velocity of angle between torso and front left link | -Inf | Inf | hip_1 (front_left_leg) | hinge | angle (rad) |
76-
| 20 | angular velocity of the angle between front left links | -Inf | Inf | ankle_1 (front_left_leg) | hinge | angle (rad) |
77-
| 21 | angular velocity of angle between torso and front right link| -Inf | Inf | hip_2 (front_right_leg) | hinge | angle (rad) |
78-
| 22 | angular velocity of the angle between front right links | -Inf | Inf | ankle_2 (front_right_leg) | hinge | angle (rad) |
79-
| 23 | angular velocity of angle between torso and back left link | -Inf | Inf | hip_3 (back_leg) | hinge | angle (rad) |
80-
| 24 | angular velocity of the angle between back left links | -Inf | Inf | ankle_3 (back_leg) | hinge | angle (rad) |
81-
| 25 | angular velocity of angle between torso and back right link | -Inf | Inf | hip_4 (right_back_leg) | hinge | angle (rad) |
82-
| 26 |angular velocity of the angle between back right links | -Inf | Inf | ankle_4 (right_back_leg) | hinge | angle (rad) |
83-
84-
85-
The remaining 14*6 = 84 elements of the observation are contact forces
86-
(external forces - force x, y, z and torque x, y, z) applied to the
87-
center of mass of each of the links. The 14 links are: the ground link,
88-
the torso link, and 3 links for each leg (1 + 1 + 12) with the 6 external forces.
89-
90-
The (x,y,z) coordinates are translational DOFs while the orientations are rotational
91-
DOFs expressed as quaternions. One can read more about free joints on the [Mujoco Documentation](https://mujoco.readthedocs.io/en/latest/XMLreference.html).
92-
93-
94-
**Note:** There have been reported issues that using a Mujoco-Py version > 2.0 results
95-
in the contact forces always being 0. As such we recommend to use a Mujoco-Py version < 2.0
96-
when using the Ant environment if you would like to report results with contact forces (if
97-
contact forces are not used in your experiments, you can use version > 2.0).
98-
99-
### Rewards
100-
The reward consists of three parts:
101-
- *healthy_reward*: Every timestep that the ant is healthy (see definition in section "Episode Termination"), it gets a reward of fixed value `healthy_reward`
102-
- *forward_reward*: A reward of moving forward which is measured as
103-
*(x-coordinate before action - x-coordinate after action)/dt*. *dt* is the time
104-
between actions and is dependent on the `frame_skip` parameter (default is 5),
105-
where the frametime is 0.01 - making the default *dt = 5 * 0.01 = 0.05*.
106-
This reward would be positive if the ant moves forward (in positive x direction).
107-
- *ctrl_cost*: A negative reward for penalising the ant if it takes actions
108-
that are too large. It is measured as *`ctrl_cost_weight` * sum(action<sup>2</sup>)*
109-
where *`ctr_cost_weight`* is a parameter set for the control and has a default value of 0.5.
110-
- *contact_cost*: A negative reward for penalising the ant if the external contact
111-
force is too large. It is calculated *`contact_cost_weight` * sum(clip(external contact
112-
force to `contact_force_range`)<sup>2</sup>)*.
113-
114-
The total reward returned is ***reward*** *=* *healthy_reward + forward_reward - ctrl_cost - contact_cost* and `info` will also contain the individual reward terms.
115-
116-
### Starting State
117-
All observations start in state
118-
(0.0, 0.0, 0.75, 1.0, 0.0 ... 0.0) with a uniform noise in the range
119-
of [-`reset_noise_scale`, `reset_noise_scale`] added to the positional values and standard normal noise
120-
with mean 0 and standard deviation `reset_noise_scale` added to the velocity values for
121-
stochasticity. Note that the initial z coordinate is intentionally selected
122-
to be slightly high, thereby indicating a standing up ant. The initial orientation
123-
is designed to make it face forward as well.
124-
125-
### Episode Termination
126-
The ant is said to be unhealthy if any of the following happens:
127-
128-
1. Any of the state space values is no longer finite
129-
2. The z-coordinate of the torso is **not** in the closed interval given by `healthy_z_range` (defaults to [0.2, 1.0])
130-
131-
If `terminate_when_unhealthy=True` is passed during construction (which is the default),
132-
the episode terminates when any of the following happens:
133-
134-
1. The episode duration reaches a 1000 timesteps
135-
2. The ant is unhealthy
136-
137-
If `terminate_when_unhealthy=False` is passed, the episode is terminated only when 1000 timesteps are exceeded.
138-
139-
### Arguments
140-
141-
No additional arguments are currently supported in v2 and lower.
142-
143-
```
144-
env = gym.make('Ant-v2')
145-
```
146-
147-
v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.
148-
149-
```
150-
env = gym.make('Ant-v3', ctrl_cost_weight=0.1, ...)
151-
```
152-
153-
| Parameter | Type | Default |Description |
154-
|-------------------------|------------|--------------|-------------------------------|
155-
| `xml_file` | **str** | `"ant.xml"` | Path to a MuJoCo model |
156-
| `ctrl_cost_weight` | **float** | `0.5` | Weight for *ctrl_cost* term (see section on reward) |
157-
| `contact_cost_weight` | **float** | `5e-4` | Weight for *contact_cost* term (see section on reward) |
158-
| `healthy_reward` | **float** | `1` | Constant reward given if the ant is "healthy" after timestep |
159-
| `terminate_when_unhealthy` | **bool**| `True` | If true, issue a done signal if the z-coordinate of the torso is no longer in the `healthy_z_range` |
160-
| `healthy_z_range` | **tuple** | `(0.2, 1)` | The ant is considered healthy if the z-coordinate of the torso is in this range |
161-
| `contact_force_range` | **tuple** | `(-1, 1)` | Contact forces are clipped to this range in the computation of *contact_cost* |
162-
| `reset_noise_scale` | **float** | `0.1` | Scale of random perturbations of initial position and velocity (see section on Starting State) |
163-
| `exclude_current_positions_from_observation`| **bool** | `True`| Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies |
164-
165-
### Version History
166-
167-
* v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
168-
* v2: All continuous control environments now use mujoco_py >= 1.50
169-
* v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
170-
* v0: Initial versions release (1.0.0)
171-
"""
172-
17312
def __init__(
17413
self,
17514
xml_file="ant.xml",
@@ -199,7 +38,7 @@ def __init__(
19938
exclude_current_positions_from_observation
20039
)
20140

202-
mujoco_env.MujocoEnv.__init__(self, xml_file, 5)
41+
mujoco_env.MujocoEnv.__init__(self, xml_file, 5, mujoco_bindings="mujoco_py")
20342

20443
@property
20544
def healthy_reward(self):

0 commit comments

Comments
 (0)