-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
I've only tested this using Zephyr (mistral-type model), but when calling LLM.generate in version 0.2.1, I receive an AssertionError in the worker runs.
Code to reproduce:
from vllm import LLM, SamplingParams
llm = LLM("HuggingFaceH4/zephyr-7b-alpha", tensor_parallel_size=4)
sampling_params = SamplingParams(
temperature=0.1,
top_p=0.95,
max_tokens=400
)
output = llm.generate(["Hello, how are you?", "Tell me a joke!"], sampling_params=sampling_params)
Same code block works fine in 0.2.0.
Full traceback in version 0.2.1:
AssertionError Traceback (most recent call last)
File <command-2469377940857761>, line 10
3 llm = LLM("HuggingFaceH4/zephyr-7b-alpha", tensor_parallel_size=4)
5 sampling_params = SamplingParams(
6 temperature=0.1,
7 top_p=0.95,
8 max_tokens=400
9 )
---> 10 output = llm.generate(["Hello, how are you?", "Tell me a joke!"], sampling_params=sampling_params)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f955471c-8346-4f89-b3b6-d25bfb1d27ae/lib/python3.10/site-packages/vllm/entrypoints/llm.py:157, in LLM.generate(self, prompts, sampling_params, prompt_token_ids, use_tqdm)
155 token_ids = prompt_token_ids[i]
156 self._add_request(prompt, sampling_params, token_ids)
--> 157 return self._run_engine(use_tqdm)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f955471c-8346-4f89-b3b6-d25bfb1d27ae/lib/python3.10/site-packages/vllm/entrypoints/llm.py:177, in LLM._run_engine(self, use_tqdm)
175 outputs: List[RequestOutput] = []
176 while self.llm_engine.has_unfinished_requests():
--> 177 step_outputs = self.llm_engine.step()
178 for output in step_outputs:
179 if output.finished:
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f955471c-8346-4f89-b3b6-d25bfb1d27ae/lib/python3.10/site-packages/vllm/engine/llm_engine.py:562, in LLMEngine.step(self)
559 return ignored
561 # Execute the model.
--> 562 output = self._run_workers(
563 "execute_model",
564 seq_group_metadata_list=seq_group_metadata_list,
565 blocks_to_swap_in=scheduler_outputs.blocks_to_swap_in,
566 blocks_to_swap_out=scheduler_outputs.blocks_to_swap_out,
567 blocks_to_copy=scheduler_outputs.blocks_to_copy,
568 )
570 return self._process_model_outputs(output, scheduler_outputs) + ignored
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f955471c-8346-4f89-b3b6-d25bfb1d27ae/lib/python3.10/site-packages/vllm/engine/llm_engine.py:712, in LLMEngine._run_workers(self, method, get_all_outputs, *args, **kwargs)
710 output = all_outputs[0]
711 for other_output in all_outputs[1:]:
--> 712 assert output == other_output
713 return output
AssertionError:
wangxiangming01, chongyangtao, KastanDay and wgmzbl
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working