-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Reproduction
Command
accelerate launch --config_file examples/accelerate_configs/deepspeed_zero3.yaml examples/scripts/sft_gemma3.py
Error Message
[rank0]: Traceback (most recent call last):
[rank0]: File "/workspace/trl/examples/scripts/sft_gemma3.py", line 67, in
[rank0]: main()
[rank0]: File "/workspace/trl/examples/scripts/sft_gemma3.py", line 60, in main
[rank0]: trainer.train()
[rank0]: File "/workspace/transformers/src/transformers/trainer.py", line 2318, in train
[rank0]: return inner_training_loop(
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/workspace/transformers/src/transformers/trainer.py", line 2613, in _inner_training_loop
[rank0]: batch_samples, num_items_in_batch = self.get_batch_samples(epoch_iterator, num_batches, args.device)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/workspace/transformers/src/transformers/trainer.py", line 5436, in get_batch_samples
[rank0]: batch_samples.append(next(epoch_iterator))
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/workspace/accelerate/src/accelerate/data_loader.py", line 567, in iter
[rank0]: current_batch = next(dataloader_iter)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/utils/data/dataloader.py", line 734, in next
[rank0]: data = self._next_data()
[rank0]: ^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/utils/data/dataloader.py", line 790, in _next_data
[rank0]: data = self._dataset_fetcher.fetch(index) # may raise StopIteration
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/workspace/transformers/src/transformers/data/data_collator.py", line 46, in call
[rank0]: return self.torch_call(features)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/workspace/trl/trl/trainer/sft_trainer.py", line 373, in torch_call
[rank0]: return self._collate_language_modeling(examples)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/workspace/trl/trl/trainer/sft_trainer.py", line 380, in _collate_language_modeling
[rank0]: images = [example["images"] for example in examples]
[rank0]: ~~~~~~~^^^^^^^^^^
[rank0]: KeyError: 'images'
The dataset doesn't have "images".
System Info
- Platform: Linux-6.8.0-53-generic-x86_64-with-glibc2.39
- Python version: 3.12.3
- TRL version: 0.22.0.dev0+1de38fc
- PyTorch version: 2.8.0+xpu
- accelerator(s): Intel(R) Data Center GPU Max 1550, Intel(R) Data Center GPU Max 1550, Intel(R) Data Center GPU Max 1550, Intel(R) Data Center GPU Max 1550, Intel(R) Data Center GPU Max 1550, Intel(R) Data Center GPU Max 1550, Intel(R) Data Center GPU Max 1550, Intel(R) Data Center GPU Max 1550
- Transformers version: 4.56.0.dev0
- Accelerate version: 1.11.0.dev0
- Accelerate config: not found
- Datasets version: 3.6.0
- HF Hub version: 0.34.4
- bitsandbytes version: 0.47.0.dev0
- DeepSpeed version: 0.17.5+047a7599
- Diffusers version: 0.36.0.dev0
- Liger-Kernel version: 0.6.1
- LLM-Blender version: not installed
- OpenAI version: 1.99.9
- PEFT version: 0.17.0
- vLLM version: 0.10.2.dev14+gbf756321c.xpu
Checklist
- I have checked that my issue isn't already filed (see open issues)
- I have included my system information
- Any code provided is minimal, complete, and reproducible (more on MREs)
- Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
- Any traceback provided is complete