Skip to content

CUDA OOM with large prompt length #127

@AlekseyKorshuk

Description

@AlekseyKorshuk

🐛 Describe the bug

Not able to train gpt2-large with ILQL with max_length=1024 on 4xA40 GPUS and ~900GB of RAM because of CUDA OOM error.

Accelerate env

- `Accelerate` version: 0.15.0
- Platform: Linux-5.13.0-40-generic-x86_64-with-glibc2.29
- Python version: 3.8.10
- Numpy version: 1.23.5
- PyTorch version (GPU?): 1.12.1+cu113 (True)
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: DEEPSPEED
        - mixed_precision: fp16
        - use_cpu: False
        - dynamo_backend: NO
        - num_processes: 4
        - machine_rank: 0
        - num_machines: 1
        - gpu_ids: None
        - main_process_ip: None
        - main_process_port: None
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - deepspeed_config: {'gradient_accumulation_steps': 1, 'offload_optimizer_device': 'cpu', 'offload_param_device': 'cpu', 'zero3_init_flag': True, 'zero_stage': 2}
        - fsdp_config: {}
        - megatron_lm_config: {}
        - downcast_bf16: no
        - tpu_name: None
        - tpu_zone: None
        - command_file: None
        - commands: None

How to reproduce

You can use my fork with small change: https://github.com/AlekseyKorshuk/trlx/tree/ilql-dalio

accelerate launch examples/dalio/ilql_dalio.py

Which trlX version are you using?

trlx==0.3.0

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions