Skip to content

AttributeError: 'DistributedDataParallel' object has no attribute 'generate' #50

@boblee22

Description

@boblee22

🐛 Describe the bug

When I ran accelerate launch examples/ppo_sentiments.py, the error below happened. Am I supposed to unwrap the ddp model?

AttributeError: 'DistributedDataParallel' object has no attribute 'generate'
╭──────────────────────────── Traceback (most recent call last) ────────────────────────────╮
│                                                                                           │
│ /home/user/bob_workspace/code/trlx/examples/ppo_sentiments.py:38 in <module
│   35 │   orch: PPOOrchestrator = get_orchestrator(cfg.train.orchestrator)(                │
│   36 │   │   model, pipeline, reward_fn=reward_fn, chunk_size=cfg.method.chunk_size       │
│   37 │   )                                                                                │
│ ❱ 38 │   orch.make_experience(cfg.method.num_rollouts)                                    │
│   39 │   model.learn()                                                                    │
│   40 │                                                                                    │
│   41 │   print("DONE!")                                                                   │
│ /home/user/bob_workspace/code/trlx/trlx/orchestrator/ppo_orchestrator.py:64 in        │
│    63 │   │   │                                                                   [82/2259]
│ ❱  64 │   │   │   query_tensors, response_tensors, response_text = self.rl_model.act(batc │
│    65 │   │   │   texts = [q + r for q, r in zip(batch.text, response_text)]              │
│    66 │   │   │   scores = self.score(texts)                                              │
│    67                                                                                     │
│                                                                                           │
│ /home/user/bob_workspace/code/trlx/trlx/model/accelerate_base_model.py:121 in act     │
│                                                                                           │
│   118 │   │   │   │   self.dummy_input.to(self.accelerator.device)                        │
│   119 │   │   │   )  # Dummy pass to make things play nice with accelerate                │
│   120 │   │   │   # Removed synced gpus                                                   │
│ ❱ 121 │   │   │   response = self.model.generate(                                         │
│   122 │   │   │   │   query_tensors,                                                      │
│   123 │   │   │   │   pad_token_id=self.tokenizer.eos_token_id,                           │
│   124 │   │   │   │   **self.config.method.gen_kwargs                                     │
│                                                                                           │
│ /opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py:1185 in __getattr__     │
│                                                                                           │
│   1182 │   │   │   modules = self.__dict__['_modules']                                    │
│   1183 │   │   │   if name in modules:                                                    │
│   1184 │   │   │   │   return modules[name]                                               │
│ ❱ 1185 │   │   raise AttributeError("'{}' object has no attribute '{}'".format(           │
│   1186 │   │   │   type(self).__name__, name))                                            │
│   1187 │                                                                                  │
│   1188 │   def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None:

My accelerate config

- `Accelerate` version: 0.13.2
- Platform: Linux-5.4.0-107-generic-x86_64-with-glibc2.31
- Python version: 3.9.5
- Numpy version: 1.23.4
- PyTorch version (GPU?): 1.11.0 (True)
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: MULTI_GPU
        - mixed_precision: no
        - use_cpu: False
        - num_processes: 8
        - machine_rank: 0
        - num_machines: 1
        - gpu_ids: all
        - main_process_ip: None
        - main_process_port: None
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - deepspeed_config: {}
        - fsdp_config: {}
        - downcast_bf16: no

Which trlX version are you using?

trlx==1.0.0

Additional system and package information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions