-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Reproduction
in grpo_trainer.py
prompts_text = self.processing_class.batch_decode(
prompt_ids, skip_special_tokens=False, clean_up_tokenization_spaces=False
)
the sample for Qwen models like:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
hello, who are you?<|im_end|>
<|im_start|>assistant
due to the padding, after self.processing_class.batch_decode it maybe as follows:
<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
hello, who are you?<|im_end|>
<|im_start|>assistant
i think it should exclude "<|endoftext|>...<|endoftext|>", otherwise it maybe get very strange response when sending such sample to vllm_server
System Info
transformers 4.51.3
trl 0.20.0.dev0
Checklist
- I have checked that my issue isn't already filed (see open issues)
- I have included my system information
- Any code provided is minimal, complete, and reproducible (more on MREs)
- Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
- Any traceback provided is complete