Skip to content

AssertionError in LLM.generate from version 0.2.1 #1386

@nathan-az

Description

@nathan-az

I've only tested this using Zephyr (mistral-type model), but when calling LLM.generate in version 0.2.1, I receive an AssertionError in the worker runs.

Code to reproduce:

from vllm import LLM, SamplingParams

llm = LLM("HuggingFaceH4/zephyr-7b-alpha", tensor_parallel_size=4)

sampling_params = SamplingParams(
    temperature=0.1,
    top_p=0.95,
    max_tokens=400
)
output = llm.generate(["Hello, how are you?", "Tell me a joke!"], sampling_params=sampling_params)

Same code block works fine in 0.2.0.

Full traceback in version 0.2.1:

AssertionError                            Traceback (most recent call last)
File <command-2469377940857761>, line 10
      3 llm = LLM("HuggingFaceH4/zephyr-7b-alpha", tensor_parallel_size=4)
      5 sampling_params = SamplingParams(
      6     temperature=0.1,
      7     top_p=0.95,
      8     max_tokens=400
      9 )
---> 10 output = llm.generate(["Hello, how are you?", "Tell me a joke!"], sampling_params=sampling_params)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f955471c-8346-4f89-b3b6-d25bfb1d27ae/lib/python3.10/site-packages/vllm/entrypoints/llm.py:157, in LLM.generate(self, prompts, sampling_params, prompt_token_ids, use_tqdm)
    155         token_ids = prompt_token_ids[i]
    156     self._add_request(prompt, sampling_params, token_ids)
--> 157 return self._run_engine(use_tqdm)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f955471c-8346-4f89-b3b6-d25bfb1d27ae/lib/python3.10/site-packages/vllm/entrypoints/llm.py:177, in LLM._run_engine(self, use_tqdm)
    175 outputs: List[RequestOutput] = []
    176 while self.llm_engine.has_unfinished_requests():
--> 177     step_outputs = self.llm_engine.step()
    178     for output in step_outputs:
    179         if output.finished:

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f955471c-8346-4f89-b3b6-d25bfb1d27ae/lib/python3.10/site-packages/vllm/engine/llm_engine.py:562, in LLMEngine.step(self)
    559     return ignored
    561 # Execute the model.
--> 562 output = self._run_workers(
    563     "execute_model",
    564     seq_group_metadata_list=seq_group_metadata_list,
    565     blocks_to_swap_in=scheduler_outputs.blocks_to_swap_in,
    566     blocks_to_swap_out=scheduler_outputs.blocks_to_swap_out,
    567     blocks_to_copy=scheduler_outputs.blocks_to_copy,
    568 )
    570 return self._process_model_outputs(output, scheduler_outputs) + ignored

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f955471c-8346-4f89-b3b6-d25bfb1d27ae/lib/python3.10/site-packages/vllm/engine/llm_engine.py:712, in LLMEngine._run_workers(self, method, get_all_outputs, *args, **kwargs)
    710 output = all_outputs[0]
    711 for other_output in all_outputs[1:]:
--> 712     assert output == other_output
    713 return output

AssertionError: 

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions