question about prompts_text processing #3686

Closed

Closed

question about prompts_text processing#3686

Labels

❓ question🏋 GRPO

opened

Reproduction

in grpo_trainer.py
prompts_text = self.processing_class.batch_decode(
prompt_ids, skip_special_tokens=False, clean_up_tokenization_spaces=False
)

the sample for Qwen models like:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
hello, who are you?<|im_end|>
<|im_start|>assistant

due to the padding, after self.processing_class.batch_decode it maybe as follows:
<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
hello, who are you?<|im_end|>
<|im_start|>assistant

i think it should exclude "<|endoftext|>...<|endoftext|>", otherwise it maybe get very strange response when sending such sample to vllm_server

System Info

transformers 4.51.3
trl 0.20.0.dev0

Checklist

I have checked that my issue isn't already filed (see open issues)
I have included my system information
Any code provided is minimal, complete, and reproducible (more on MREs)
Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
Any traceback provided is complete

Metadata

Assignees

No one assigned

Labels

❓ question🏋 GRPO

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests