vllm + chatglm3, it can't generate content as expected

When using vllm + ChatGLM3 and accessing the openai API, it cannot generate content. The reason is that:

1. In `conversation.py`, the ChatGLM3 template has a `stop_token_ids` with ID `2`:
```python
register_conv_template(
    Conversation(
        name="chatglm3",
        system_template="<|system|>\n {system_message}",
        roles=("<|user|>", "<|assistant|>"),
        sep_style=SeparatorStyle.CHATGLM3,
        stop_token_ids=[
            64795,
            64797,
            2,
        ],  # "<|user|>", "<|observation|>", "</s>"
    )
)
```

2. The `generate_stream` method in `vllm_worker.py` converts `stop_token_ids` into stop words. The token with ID `2` becomes an empty string after decoding:
```python
        for tid in stop_token_ids:
            if tid is not None:
                stop.add(self.tokenizer.decode(tid))
```
Then the `SamplingParams` are:
```python
SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.5, top_p=1.0, top_k=-1, min_p=0.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['', '<|observation|>', '<|user|>'], stop_token_ids=[64795, 64797, 2, 2], ignore_eos=False, max_tokens=32750, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True)
```
The param `stop` contains '', so it cannot generate normally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vllm + chatglm3, it can't generate content as expected #2845

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vllm + chatglm3, it can't generate content as expected #2845

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions