-
-
Notifications
You must be signed in to change notification settings - Fork 115
Open
Description
Describe the bug
The observation and info returned at the last step in PointMaze with continuing_task=True
, aren't updated (i.e. they contain the old goal). This is not the intended general semantics: in a common RL loop, the agent will use the old observation to predict the action to go to the old goal, instead of the new one.
See related issue: Farama-Foundation/Minari#265
See:
Gymnasium-Robotics/gymnasium_robotics/envs/maze/point_maze.py
Lines 392 to 406 in 3719d9d
def step(self, action): | |
obs, _, _, _, info = self.point_env.step(action) | |
obs_dict = self._get_obs(obs) | |
reward = self.compute_reward(obs_dict["achieved_goal"], self.goal, info) | |
terminated = self.compute_terminated(obs_dict["achieved_goal"], self.goal, info) | |
truncated = self.compute_truncated(obs_dict["achieved_goal"], self.goal, info) | |
info["success"] = bool( | |
np.linalg.norm(obs_dict["achieved_goal"] - self.goal) <= 0.45 | |
) | |
# Update the goal position if necessary | |
self.update_goal(obs_dict["achieved_goal"]) | |
return obs_dict, reward, terminated, truncated, info |
Code example
You need an expert policy to see this; check https://github.com/Farama-Foundation/minari-dataset-generation-scripts/blob/main/scripts/pointmaze/create_pointmaze_dataset.py
LorisGaven
Metadata
Metadata
Assignees
Labels
No labels