-
Notifications
You must be signed in to change notification settings - Fork 294
Description
🐛 Bug
The whole point of the Pipe module is to split a batch into #chunks microbatches and then process these through the stages of the pipeline in order to achieve parallelism by having multiple microbatches being processed on different GPUs at the same time. The benchmark in bechmarks/transformer.py doesn't specify chunks so it defaults to chunks=1, which doesn't make use of any of the microbatch logic. Further, changing the benchmark to set chunks=2 or chunks=4 yields a slowdown, when I would expect that more chunks -> more parallelism.
Command
PYTHONPATH=$PWD python benchmarks/transformer.py
To Reproduce
Steps to reproduce the behavior:
PYTHONPATH=$PWD python benchmarks/transformer.py- Change L263 to specify chunks=2 and rerun the command, e.g.
p = pipe.Pipe(model, balance, chunks=2) - Change L263 to specify chunks=4 and rerun the command
chunks=1: test loss 5.57 | time: 30.72s | words: 2304870 | wps: 75028.93
chunks=2: test loss 5.58 | time: 53.51s | words: 2304870 | wps: 43077.41
chunks=4: test loss 5.57 | time: 81.93s | words: 2304870 | wps: 28133.60
Expected behavior
chunks=N is faster than chunks=1 for some N when there are more than 1 devices
Environment
Collecting environment information...
PyTorch version: 1.6.0
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Quadro GP100
GPU 1: Quadro GP100
Nvidia driver version: 418.116.00
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] torch==1.6.0
[pip3] torchtext==0.7.0
[pip3] torchvision==0.7.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] mkl 2020.1 217
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.1.0 py37h23d657b_0
[conda] mkl_random 1.1.1 py37h0da4684_0 conda-forge
[conda] numpy 1.19.1 py37hbc911f0_0
[conda] numpy-base 1.19.1 py37hfa32c7d_0
[conda] pytorch 1.6.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] torchtext 0.7.0 pypi_0 pypi
[conda] torchvision 0.7.0 py37_cu101 pytorch