-
Notifications
You must be signed in to change notification settings - Fork 482
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
🐛 Describe the bug
When I ran accelerate launch examples/ppo_sentiments.py
, the error below happened. Am I supposed to unwrap the ddp model?
AttributeError: 'DistributedDataParallel' object has no attribute 'generate'
╭──────────────────────────── Traceback (most recent call last) ────────────────────────────╮
│ │
│ /home/user/bob_workspace/code/trlx/examples/ppo_sentiments.py:38 in <module
│ 35 │ orch: PPOOrchestrator = get_orchestrator(cfg.train.orchestrator)( │
│ 36 │ │ model, pipeline, reward_fn=reward_fn, chunk_size=cfg.method.chunk_size │
│ 37 │ ) │
│ ❱ 38 │ orch.make_experience(cfg.method.num_rollouts) │
│ 39 │ model.learn() │
│ 40 │ │
│ 41 │ print("DONE!") │
│ /home/user/bob_workspace/code/trlx/trlx/orchestrator/ppo_orchestrator.py:64 in │
│ 63 │ │ │ [82/2259]
│ ❱ 64 │ │ │ query_tensors, response_tensors, response_text = self.rl_model.act(batc │
│ 65 │ │ │ texts = [q + r for q, r in zip(batch.text, response_text)] │
│ 66 │ │ │ scores = self.score(texts) │
│ 67 │
│ │
│ /home/user/bob_workspace/code/trlx/trlx/model/accelerate_base_model.py:121 in act │
│ │
│ 118 │ │ │ │ self.dummy_input.to(self.accelerator.device) │
│ 119 │ │ │ ) # Dummy pass to make things play nice with accelerate │
│ 120 │ │ │ # Removed synced gpus │
│ ❱ 121 │ │ │ response = self.model.generate( │
│ 122 │ │ │ │ query_tensors, │
│ 123 │ │ │ │ pad_token_id=self.tokenizer.eos_token_id, │
│ 124 │ │ │ │ **self.config.method.gen_kwargs │
│ │
│ /opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py:1185 in __getattr__ │
│ │
│ 1182 │ │ │ modules = self.__dict__['_modules'] │
│ 1183 │ │ │ if name in modules: │
│ 1184 │ │ │ │ return modules[name] │
│ ❱ 1185 │ │ raise AttributeError("'{}' object has no attribute '{}'".format( │
│ 1186 │ │ │ type(self).__name__, name)) │
│ 1187 │ │
│ 1188 │ def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None:
My accelerate config
- `Accelerate` version: 0.13.2
- Platform: Linux-5.4.0-107-generic-x86_64-with-glibc2.31
- Python version: 3.9.5
- Numpy version: 1.23.4
- PyTorch version (GPU?): 1.11.0 (True)
- `Accelerate` default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: no
- use_cpu: False
- num_processes: 8
- machine_rank: 0
- num_machines: 1
- gpu_ids: all
- main_process_ip: None
- main_process_port: None
- rdzv_backend: static
- same_network: True
- main_training_function: main
- deepspeed_config: {}
- fsdp_config: {}
- downcast_bf16: no
Which trlX version are you using?
trlx==1.0.0
Additional system and package information
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working