-
Notifications
You must be signed in to change notification settings - Fork 482
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
🐛 Describe the bug
Traceback (most recent call last):
File "examples/ppo_sentiments.py", line 61, in <module>
main()
File "examples/ppo_sentiments.py", line 52, in main
trlx.train(
File "/trlx/trlx/trlx.py", line 81, in train
orch.make_experience(config.method.num_rollouts)
File "/trlx/trlx/orchestrator/ppo_orchestrator.py", line 161, in make_experience
logits, *_, values = self.trainer.model(
File "/trlx/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/trlx/.env/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
return func(*args, **kwargs)
File "/trlx/.env/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1836, in forward
loss = self.module(*inputs, **kwargs)
File "/trlx/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/trlx/trlx/trainer/nn/ppo_models.py", line 393, in forward
transformer_outputs = self.base_model.transformer(**forward_kwargs)
File "/trlx/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/trlx/.env/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 832, in forward
inputs_embeds = self.wte(input_ids)
File "/trlx/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/trlx/.env/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 160, in forward
return F.embedding(
File "/trlx/.env/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.HalfTensor instead (while checking arguments for embedding)
compute_environment: LOCAL_MACHINE
deepspeed_config:
deepspeed_multinode_launcher: standard
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: false
zero_stage: 2
distributed_type: DEEPSPEED
downcast_bf16: false
fsdp_config: {}
main_process_port: 1234
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 1
use_cpu: false
accelerate launch --num_processes 1 --config_file fp16-zero2-deepspeed.yaml examples/ppo_sentiments.py
Which trlX version are you using?
Additional system and package information
torch==1.13.0+cu116 deepspeed==0.8.0
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working