Skip to content

sft_gemma3 example doesn't work #3957

@yao-matrix

Description

@yao-matrix

Reproduction

Command

accelerate launch --config_file examples/accelerate_configs/deepspeed_zero3.yaml examples/scripts/sft_gemma3.py

Error Message

[rank0]: Traceback (most recent call last):
[rank0]: File "/workspace/trl/examples/scripts/sft_gemma3.py", line 67, in
[rank0]: main()
[rank0]: File "/workspace/trl/examples/scripts/sft_gemma3.py", line 60, in main
[rank0]: trainer.train()
[rank0]: File "/workspace/transformers/src/transformers/trainer.py", line 2318, in train
[rank0]: return inner_training_loop(
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/workspace/transformers/src/transformers/trainer.py", line 2613, in _inner_training_loop
[rank0]: batch_samples, num_items_in_batch = self.get_batch_samples(epoch_iterator, num_batches, args.device)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/workspace/transformers/src/transformers/trainer.py", line 5436, in get_batch_samples
[rank0]: batch_samples.append(next(epoch_iterator))
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/workspace/accelerate/src/accelerate/data_loader.py", line 567, in iter
[rank0]: current_batch = next(dataloader_iter)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/utils/data/dataloader.py", line 734, in next
[rank0]: data = self._next_data()
[rank0]: ^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/utils/data/dataloader.py", line 790, in _next_data
[rank0]: data = self._dataset_fetcher.fetch(index) # may raise StopIteration
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/workspace/transformers/src/transformers/data/data_collator.py", line 46, in call
[rank0]: return self.torch_call(features)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/workspace/trl/trl/trainer/sft_trainer.py", line 373, in torch_call
[rank0]: return self._collate_language_modeling(examples)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/workspace/trl/trl/trainer/sft_trainer.py", line 380, in _collate_language_modeling
[rank0]: images = [example["images"] for example in examples]
[rank0]: ~~~~~~~^^^^^^^^^^
[rank0]: KeyError: 'images'

The dataset doesn't have "images".

System Info

  • Platform: Linux-6.8.0-53-generic-x86_64-with-glibc2.39
  • Python version: 3.12.3
  • TRL version: 0.22.0.dev0+1de38fc
  • PyTorch version: 2.8.0+xpu
  • accelerator(s): Intel(R) Data Center GPU Max 1550, Intel(R) Data Center GPU Max 1550, Intel(R) Data Center GPU Max 1550, Intel(R) Data Center GPU Max 1550, Intel(R) Data Center GPU Max 1550, Intel(R) Data Center GPU Max 1550, Intel(R) Data Center GPU Max 1550, Intel(R) Data Center GPU Max 1550
  • Transformers version: 4.56.0.dev0
  • Accelerate version: 1.11.0.dev0
  • Accelerate config: not found
  • Datasets version: 3.6.0
  • HF Hub version: 0.34.4
  • bitsandbytes version: 0.47.0.dev0
  • DeepSpeed version: 0.17.5+047a7599
  • Diffusers version: 0.36.0.dev0
  • Liger-Kernel version: 0.6.1
  • LLM-Blender version: not installed
  • OpenAI version: 1.99.9
  • PEFT version: 0.17.0
  • vLLM version: 0.10.2.dev14+gbf756321c.xpu

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete

Metadata

Metadata

Assignees

Labels

⚡ PEFTRelated to PEFT⚡accelerateRelated to accelerate🐛 bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions