Skip to content

bechmarks/transformer.py doesn't specify chunks when constructing Pipe #51

@froody

Description

@froody

🐛 Bug

The whole point of the Pipe module is to split a batch into #chunks microbatches and then process these through the stages of the pipeline in order to achieve parallelism by having multiple microbatches being processed on different GPUs at the same time. The benchmark in bechmarks/transformer.py doesn't specify chunks so it defaults to chunks=1, which doesn't make use of any of the microbatch logic. Further, changing the benchmark to set chunks=2 or chunks=4 yields a slowdown, when I would expect that more chunks -> more parallelism.

Command

PYTHONPATH=$PWD python benchmarks/transformer.py

To Reproduce

Steps to reproduce the behavior:

  1. PYTHONPATH=$PWD python benchmarks/transformer.py
  2. Change L263 to specify chunks=2 and rerun the command, e.g. p = pipe.Pipe(model, balance, chunks=2)
  3. Change L263 to specify chunks=4 and rerun the command

chunks=1: test loss 5.57 | time: 30.72s | words: 2304870 | wps: 75028.93
chunks=2: test loss 5.58 | time: 53.51s | words: 2304870 | wps: 43077.41
chunks=4: test loss 5.57 | time: 81.93s | words: 2304870 | wps: 28133.60

Expected behavior

chunks=N is faster than chunks=1 for some N when there are more than 1 devices

Environment

Collecting environment information...
PyTorch version: 1.6.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Quadro GP100
GPU 1: Quadro GP100

Nvidia driver version: 418.116.00
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] torch==1.6.0
[pip3] torchtext==0.7.0
[pip3] torchvision==0.7.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] mkl 2020.1 217
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.1.0 py37h23d657b_0
[conda] mkl_random 1.1.1 py37h0da4684_0 conda-forge
[conda] numpy 1.19.1 py37hbc911f0_0
[conda] numpy-base 1.19.1 py37hfa32c7d_0
[conda] pytorch 1.6.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] torchtext 0.7.0 pypi_0 pypi
[conda] torchvision 0.7.0 py37_cu101 pytorch

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions