Skip to content

TypeError: forward() missing 1 required positional argument: 'inputs' when training #176

@mmmmllll1

Description

@mmmmllll1

I'm trying to run training and getting the following error;

Exception in thread Thread-19:                                                                       
Traceback (most recent call last):                                                                                                                                                                         
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner 
    self.run()                                  
  File "/usr/lib/python3.7/threading.py", line 870, in run           
    self._target(*self._args, **self._kwargs)                                                                                                                                                              
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/application/utils.py", line 66, in background_task
    raise e                                                                                                                                                                                                
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/application/utils.py", line 62, in background_task
    func(logging=logger, **kwargs)              
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/training/train.py", line 219, in train
    y, y_pred = process_batch(batch, model)                                                                                                                                                                
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/training/tacotron2_model/utils.py", line 88, in process_batch
    y_pred = model(batch, mask_size=output_length_size, alignment_mask_size=input_length_size)
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl          
    return forward_call(*input, **kwargs)                                                            
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/venv/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/venv/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/venv/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()                                                                                 
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/venv/lib/python3.7/site-packages/torch/_utils.py", line 425, in reraise
    raise self.exc_type(msg)                                                                                                                                                                               
TypeError: Caught TypeError in replica 6 on device 6.
Original Traceback (most recent call last):                                                          
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/venv/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)                                                                
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)                                                            
TypeError: forward() missing 1 required positional argument: 'inputs'   

A quick googling makes it seems like this might be related to pytorch/pytorch#31460 ?

I had to reduce the batch size to 128 otherwise the first GPU runs out of memory with the dreaded "CUDA out of memory" error.

This machine has 8 x V100 16GB Nvidia GPUs in it (I've excluded the T1000 using CUDA_VISIBLE_DEVICES). See below;

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-16GB            On | 00000000:0D:00.0 Off |                    0 |
| N/A   40C    P0               56W / 300W|  16147MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2-16GB            On | 00000000:0E:00.0 Off |                    0 |
| N/A   38C    P0               57W / 300W|  12529MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2-16GB            On | 00000000:14:00.0 Off |                    0 |
| N/A   36C    P0               57W / 300W|  11233MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2-16GB            On | 00000000:15:00.0 Off |                    0 |
| N/A   39C    P0               57W / 300W|  10655MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA T1000 8GB                On | 00000000:82:00.0 Off |                  N/A |
| 35%   33C    P8               N/A /  50W|      6MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   5  Tesla V100-SXM2-16GB            On | 00000000:8B:00.0 Off |                    0 |
| N/A   37C    P0               40W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2-16GB            On | 00000000:8C:00.0 Off |                    0 |
| N/A   34C    P0               41W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2-16GB            On | 00000000:8F:00.0 Off |                    0 |
| N/A   36C    P0               38W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   8  Tesla V100-SXM2-16GB            On | 00000000:90:00.0 Off |                    0 |
| N/A   37C    P0               40W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

I tried limiting the number of GPUs to 4 with a batch size of 64 and it seems to work? Output below;

Mon May 29 02:17:19 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-16GB            On | 00000000:0D:00.0 Off |                    0 |
| N/A   43C    P0               75W / 300W|  11675MiB / 16384MiB |     18%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2-16GB            On | 00000000:0E:00.0 Off |                    0 |
| N/A   40C    P0               74W / 300W|   7899MiB / 16384MiB |     17%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2-16GB            On | 00000000:14:00.0 Off |                    0 |
| N/A   38C    P0               72W / 300W|   7263MiB / 16384MiB |     17%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2-16GB            On | 00000000:15:00.0 Off |                    0 |
| N/A   42C    P0               75W / 300W|   6923MiB / 16384MiB |     15%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA T1000 8GB                On | 00000000:82:00.0 Off |                  N/A |
| 35%   32C    P8               N/A /  50W|      6MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   5  Tesla V100-SXM2-16GB            On | 00000000:8B:00.0 Off |                    0 |
| N/A   37C    P0               40W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2-16GB            On | 00000000:8C:00.0 Off |                    0 |
| N/A   34C    P0               41W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2-16GB            On | 00000000:8F:00.0 Off |                    0 |
| N/A   36C    P0               38W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   8  Tesla V100-SXM2-16GB            On | 00000000:90:00.0 Off |                    0 |
| N/A   37C    P0               40W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1428813      C   python                                    11668MiB |
|    1   N/A  N/A   1428813      C   python                                     7892MiB |
|    2   N/A  N/A   1428813      C   python                                     7256MiB |
|    3   N/A  N/A   1428813      C   python                                     6916MiB |
+---------------------------------------------------------------------------------------+

Appreciate any advice for getting this working.....

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions