You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have opened an issue to track and fix this: https://github.com/ray-project/ray/issues/53533. Please add comments to this issue thread if you are facing problems that fall under this category. We have identified the following example issues so far:
915
+
916
+
- Tool calling is not supported in the same way it is supported in `vllm serve` CLI. This is due to the extra logical pieces of code that map CLI args to extra layers in the API server critical path.
917
+
918
+
- vLLM has a different critical path for dealing with the tokenizer of Mistral models. This logic has not been copied over to Ray Serve LLM, because we want to find a more maintainable solution.
919
+
920
+
- Some sampling parameters, like `max_completion_tokens`, are supported differently in Ray Serve LLM and `vllm serve` CLI.
0 commit comments