Skip to content

question about prompts_text processing #3686

@AGI-player

Description

@AGI-player

Reproduction

in grpo_trainer.py
prompts_text = self.processing_class.batch_decode(
prompt_ids, skip_special_tokens=False, clean_up_tokenization_spaces=False
)

the sample for Qwen models like:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
hello, who are you?<|im_end|>
<|im_start|>assistant

due to the padding, after self.processing_class.batch_decode it maybe as follows:
<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
hello, who are you?<|im_end|>
<|im_start|>assistant

i think it should exclude "<|endoftext|>...<|endoftext|>", otherwise it maybe get very strange response when sending such sample to vllm_server

System Info

transformers 4.51.3
trl 0.20.0.dev0

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete

Metadata

Metadata

Assignees

No one assigned

    Labels

    ❓ questionSeeking clarification or more information🏋 GRPORelated to GRPO

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions