[Bug]: `top_k: 0` in generation_config.json can't disable top-k sampling

### Your current environment

```text
vLLM Version: 0.8.5
```


### 🐛 Describe the bug

There are a few differences in `top_k` between vLLM and HuggingFace.

- **meaning**: To turn off `top_k`, vLLM expects `-1` while HuggingFace expects `None` or `0`. https://github.com/huggingface/transformers/blob/v4.51.3/src/transformers/generation/utils.py#L1195
- **default**: HuggingFace uses `top_k=50` unless it's specified. https://github.com/huggingface/transformers/blob/v4.51.3/src/transformers/generation/configuration_utils.py#L425

Therefore, we'd write either `"top_k": 0` or `"top_k": null` to `generation_config.json` in order to recommend users to disable top-k sampling.

However `"top_k": 0` isn't working. For example, (https://huggingface.co/Qwen/Qwen-1_8B/blob/main/generation_config.json#L8)
```
$ vllm serve Qwen/Qwen-1_8B --trust-remote-code
…
WARNING 05-02 03:26:16 [config.py:1239] Default sampling parameters have been overridden by the model's Hugging Face generation config recommended from the model creator. If this is not intended, please relaunch vLLM instance with `--generation-config vllm`.
INFO 05-02 03:26:16 [serving_chat.py:118] Using default chat sampling params from model: {'top_k': 0, 'top_p': 0.8, 'max_tokens': 512}
INFO 05-02 03:26:16 [serving_completion.py:61] Using default completion sampling params from model: {'top_k': 0, 'top_p': 0.8, 'max_tokens': 512}
…
INFO:     Started server process [4971]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     127.0.0.1:38088 - "POST /v1/completions HTTP/1.1" 400 Bad Request
```
```
$ curl localhost:8000/v1/completions --json '{"model": "Qwen/Qwen-1_8B", "prompt": "a"}'
{"object":"error","message":"top_k must be -1 (disable), or at least 1, got 0.","type":"BadRequestError","param":null,"code":400}
```

The difference in the meaning of `top_k` could be handled, similarly to `max_new_tokens` → `max_tokens`, in https://github.com/vllm-project/vllm/blob/v0.8.5/vllm/config.py#L1217-L1234

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: `top_k: 0` in generation_config.json can't disable top-k sampling #17553

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: top_k: 0 in generation_config.json can't disable top-k sampling #17553

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: `top_k: 0` in generation_config.json can't disable top-k sampling #17553