-
-
Notifications
You must be signed in to change notification settings - Fork 9.5k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
vLLM Version: 0.8.5
🐛 Describe the bug
There are a few differences in top_k
between vLLM and HuggingFace.
- meaning: To turn off
top_k
, vLLM expects-1
while HuggingFace expectsNone
or0
. https://github.com/huggingface/transformers/blob/v4.51.3/src/transformers/generation/utils.py#L1195 - default: HuggingFace uses
top_k=50
unless it's specified. https://github.com/huggingface/transformers/blob/v4.51.3/src/transformers/generation/configuration_utils.py#L425
Therefore, we'd write either "top_k": 0
or "top_k": null
to generation_config.json
in order to recommend users to disable top-k sampling.
However "top_k": 0
isn't working. For example, (https://huggingface.co/Qwen/Qwen-1_8B/blob/main/generation_config.json#L8)
$ vllm serve Qwen/Qwen-1_8B --trust-remote-code
…
WARNING 05-02 03:26:16 [config.py:1239] Default sampling parameters have been overridden by the model's Hugging Face generation config recommended from the model creator. If this is not intended, please relaunch vLLM instance with `--generation-config vllm`.
INFO 05-02 03:26:16 [serving_chat.py:118] Using default chat sampling params from model: {'top_k': 0, 'top_p': 0.8, 'max_tokens': 512}
INFO 05-02 03:26:16 [serving_completion.py:61] Using default completion sampling params from model: {'top_k': 0, 'top_p': 0.8, 'max_tokens': 512}
…
INFO: Started server process [4971]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: 127.0.0.1:38088 - "POST /v1/completions HTTP/1.1" 400 Bad Request
$ curl localhost:8000/v1/completions --json '{"model": "Qwen/Qwen-1_8B", "prompt": "a"}'
{"object":"error","message":"top_k must be -1 (disable), or at least 1, got 0.","type":"BadRequestError","param":null,"code":400}
The difference in the meaning of top_k
could be handled, similarly to max_new_tokens
→ max_tokens
, in https://github.com/vllm-project/vllm/blob/v0.8.5/vllm/config.py#L1217-L1234
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
chaunceyjiang
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working