You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/rl-algorithms/ppo.md
+1-2Lines changed: 1 addition & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -742,8 +742,7 @@ Tracked experiments and game play videos:
742
742
743
743
744
744
## `ppo_pettingzoo_ma_atari.py`
745
-
746
-
`ppo_pettingzoo_ma_atari.py` trains an agent to learn playing Atari games via selfplay. The selfplay environment is implemented as a vectorized environment from [PettingZoo.ml](https://www.pettingzoo.ml/atari). The basic idea is to create vectorized environment $E$with`num_envs = N`, where $N$is the number of players in the game. Say $N = 2$, then the 0-th sub environment of $E$ will return the observation for player 0and1-th sub environment will return the observation of player 1. Then the two environments takes a batch of 2 actions and execute them for player 0and player 1, respectively. See "Vectorized architecture"in [The 37 Implementation Details of Proximal Policy Optimization](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/) for more detail.
745
+
[ppo_pettingzoo_ma_atari.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_pettingzoo_ma_atari.py) trains an agent to learn playing Atari games via selfplay. The selfplay environment is implemented as a vectorized environment from [PettingZoo.ml](https://www.pettingzoo.ml/atari). The basic idea is to create vectorized environment $E$with`num_envs = N`, where $N$is the number of players in the game. Say $N = 2$, then the 0-th sub environment of $E$ will return the observation for player 0and1-th sub environment will return the observation of player 1. Then the two environments takes a batch of 2 actions and execute them for player 0and player 1, respectively. See "Vectorized architecture"in [The 37 Implementation Details of Proximal Policy Optimization](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/) for more detail.
747
746
748
747
`ppo_pettingzoo_ma_atari.py` has the following features:
0 commit comments