-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Open
Description
When using vllm + ChatGLM3 and accessing the openai API, it cannot generate content. The reason is that:
- In
conversation.py, the ChatGLM3 template has astop_token_idswith ID2:
register_conv_template(
Conversation(
name="chatglm3",
system_template="<|system|>\n {system_message}",
roles=("<|user|>", "<|assistant|>"),
sep_style=SeparatorStyle.CHATGLM3,
stop_token_ids=[
64795,
64797,
2,
], # "<|user|>", "<|observation|>", "</s>"
)
)- The
generate_streammethod invllm_worker.pyconvertsstop_token_idsinto stop words. The token with ID2becomes an empty string after decoding:
for tid in stop_token_ids:
if tid is not None:
stop.add(self.tokenizer.decode(tid))Then the SamplingParams are:
SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.5, top_p=1.0, top_k=-1, min_p=0.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['', '<|observation|>', '<|user|>'], stop_token_ids=[64795, 64797, 2, 2], ignore_eos=False, max_tokens=32750, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True)The param stop contains '', so it cannot generate normally.
Metadata
Metadata
Assignees
Labels
No labels