-
Notifications
You must be signed in to change notification settings - Fork 184
Description
In IsaacGymEnvs, rl-games + multiGPU seems to have some issues. As shown in the screenshot, rl-games + multiGPU performs uses twice amount of data and performs worse than the single GPU setting in Ant
This issue tracks the investigation of this issue.
Proposed debugging route
I suggest making sure we make sure there is no loss in sample efficiency first before scaling to more envs by matching implementation details in our prototype in CleanRL: https://cleanrl-git-new-multi-gpu-vwxyzjn.vercel.app/rl-algorithms/ppo/#implementation-details_6.
Identified issues:
1. Seeding logic and configuration issue
We need to seed multiGPU processes with different seeds to decorrelate experience, otherwise the multiGPU processes will produce the exact observations.
Configuration-wise we can set the overall seed with params.seed
and env seed with params.config.env_config.seed
, so if params.config.env_config.seed
is set but params.seed
is not set, we get identical observations from the environments as shown below:
This is probably ok since the agent still samples different actions, but it's nonetheless a problem. The correct implementation is to use seed = seed + local_rank
.
2. stepping logic issue
After fixing #163, I was able to match the sample efficiency in the single GPU setting:
However, the wall time is slower than I had expected. On a separate benchmark I made with CleanRL, the experiments show horovod should make Ant step 20% faster.
Maybe it's the averaging stats overhead? In the CleanRL benchmark experiments I did not mess with stats at all.