generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Closed
Labels
🐛 bugSomething isn't workingSomething isn't working
Description
Reproduction
from datasets import Dataset
from trl import apply_chat_template
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B", use_fast=True)
dataset_dict = {
"prompt": [[{"role": "user", "content": "What color is the sky?"}],
[{"role": "user", "content": "Where is the sun?"}]],
"completion": [[{"role": "assistant", "content": "It is blue."}],
[{"role": "assistant", "content": "In the sky."}]]
}
dataset = Dataset.from_dict(dataset_dict)
dataset = dataset.map(apply_chat_template, fn_kwargs={"tokenizer": tokenizer})
outputs:
[rank4]: File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 560, in wrapper
[rank4]: out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
[rank4]: File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 3055, in map
[rank4]: for rank, done, content in Dataset._map_single(**dataset_kwargs):
[rank4]: File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 3428, in _map_single
[rank4]: example = apply_function_on_filtered_inputs(example, i, offset=offset)
[rank4]: File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 3320, in apply_function_on_filtered_inputs
[rank4]: processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank4]: File "/usr/local/lib/python3.10/dist-packages/trl/data_utils.py", line 152, in apply_chat_template
[rank4]: raise ValueError(error_message.format(prompt, prompt_completion))
[rank4]: ValueError: The chat template applied to the prompt + completion does not start with the chat template applied to the prompt alone. This can indicate that the chat template is not supported by TRL.
[rank4]: **Prompt**:
[rank4]: <|begin▁of▁sentence|><|User|>What color is the sky?<|Assistant|><think>
System Info
trl==0.17.0
datasets==3.1.0
Checklist
- I have checked that my issue isn't already filed (see open issues)
- I have included my system information
- Any code provided is minimal, complete, and reproducible (more on MREs)
- Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
- Any traceback provided is complete
Metadata
Metadata
Assignees
Labels
🐛 bugSomething isn't workingSomething isn't working