PPO fp16

### 🐛 Describe the bug

```
Traceback (most recent call last):
  File "examples/ppo_sentiments.py", line 61, in <module>
    main()
  File "examples/ppo_sentiments.py", line 52, in main
    trlx.train(
  File "/trlx/trlx/trlx.py", line 81, in train
    orch.make_experience(config.method.num_rollouts)
  File "/trlx/trlx/orchestrator/ppo_orchestrator.py", line 161, in make_experience
    logits, *_, values = self.trainer.model(
  File "/trlx/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/trlx/.env/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
    return func(*args, **kwargs)
  File "/trlx/.env/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1836, in forward
    loss = self.module(*inputs, **kwargs)
  File "/trlx/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/trlx/trlx/trainer/nn/ppo_models.py", line 393, in forward
    transformer_outputs = self.base_model.transformer(**forward_kwargs)
  File "/trlx/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/trlx/.env/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 832, in forward
    inputs_embeds = self.wte(input_ids)
  File "/trlx/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/trlx/.env/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 160, in forward
    return F.embedding(
  File "/trlx/.env/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.HalfTensor instead (while checking arguments for embedding)
```
```
compute_environment: LOCAL_MACHINE
deepspeed_config:
  deepspeed_multinode_launcher: standard
  gradient_accumulation_steps: 1
  gradient_clipping: 1.0
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: false
  zero_stage: 2
distributed_type: DEEPSPEED
downcast_bf16: false
fsdp_config: {}
main_process_port: 1234
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 1
use_cpu: false
```


```bash
accelerate launch --num_processes 1 --config_file fp16-zero2-deepspeed.yaml examples/ppo_sentiments.py
```

### Which trlX version are you using?

[main](https://github.com/CarperAI/trlx/tree/2855d0ce3aca741f6fa9e7bc41d4cc14a0b83ee1)

### Additional system and package information

``` torch==1.13.0+cu116 deepspeed==0.8.0 ```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PPO fp16 #238

🐛 Describe the bug

Which trlX version are you using?

Additional system and package information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PPO fp16 #238

Description

🐛 Describe the bug

Which trlX version are you using?

Additional system and package information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions