Skip to content

Commit 5f720f7

Browse files
[Serve.llm] Add a doc snippet to inform users about existing diffs between vllm serve and ray serve llm. (#54042)
Signed-off-by: Kourosh Hakhamaneshi <[email protected]> Signed-off-by: kourosh hakhamaneshi <[email protected]> Co-authored-by: Seiji Eicher <[email protected]>
1 parent 2032e6f commit 5f720f7

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

doc/source/serve/llm/serving-llms.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -907,6 +907,20 @@ An example config is shown below:
907907
name: llm_app
908908
route_prefix: "/"
909909
910+
911+
There are some differences between `vllm serve` cli and Ray Serve LLM endpoint behavior. How do I work around that?
912+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
913+
914+
We have opened an issue to track and fix this: https://github.com/ray-project/ray/issues/53533. Please add comments to this issue thread if you are facing problems that fall under this category. We have identified the following example issues so far:
915+
916+
- Tool calling is not supported in the same way it is supported in `vllm serve` CLI. This is due to the extra logical pieces of code that map CLI args to extra layers in the API server critical path.
917+
918+
- vLLM has a different critical path for dealing with the tokenizer of Mistral models. This logic has not been copied over to Ray Serve LLM, because we want to find a more maintainable solution.
919+
920+
- Some sampling parameters, like `max_completion_tokens`, are supported differently in Ray Serve LLM and `vllm serve` CLI.
921+
922+
923+
910924
Usage Data Collection
911925
--------------------------
912926
We collect usage data to improve Ray Serve LLM.

0 commit comments

Comments
 (0)