-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
Model Input Dumps
No response
🐛 Describe the bug
Cause
There is a change in advance_step API in PR #8378 . The changed is made to flash_attn.py
and flashinfer.py
but not other backend.
Code Snippet to highlight the difference in advance_step
API
flash_attn.py
def advance_step(self,
model_input: "ModelInputForGPUWithSamplingMetadata",
sampled_token_ids: Optional[torch.Tensor],
block_size: int,
num_seqs: int,
num_queries: int,
turn_prefills_into_decodes: bool = False):
rocm_flash_attn.py
def advance_step(self, model_input: "ModelInputForGPUWithSamplingMetadata",
sampled_token_ids: Optional[torch.Tensor],
block_size: int, num_seqs: int, num_queries: int):
Confusion whether this is a bug or because multistep feature is not supported on AMD
I saw there are other PR that is stating that multi steps feature are working on AMD e.g. #8474 .
Logs
The error logs and traceback is as follows:
ERROR 10-06 16:38:45 engine.py:157]
(VIImWorkerProcess pid=207827) ERROR 10-06 16:38:45 multiproc_worker_utils . py : 231
rker_base.py", line 327, in execute_model
ERROR 10-06 16:38:45 engine.py:157] TypeError("advance_step() got an unexpected keyword argument 'turn_prefills_into_decodes'")
ERROR 10-06 16:38:45 engine.py:157] Traceback (most recent call last):
File "/home/aac/apps/rocm611-0929/vllm-fix-spec-amd/vllm/engine/multiprocessing/engine.py", line 155, in start
self.run_engine_loop()
File "/home/aac/apps/rocm611-0929/vllm-fix-spec-amd/vllm/engine/multiprocessing/engine.py", line 218, in run_engi
request_outputs = self. engine_step()
File "/home/aac/apps/rocm611-0929/vllm-fix-spec-amd/vllm/engine/multiprocessing/engine.py", line 236, in engine_s
raise e
File "/home/aac/apps/rocm611-0929/vllm-fix-spec-amd/vllm/engine/multiprocessing/engine.py", line 227, in engine_s
return self.engine. step()
File "/home/aac/apps/rocm611-0929/vllm-fix-spec-amd/vllm/engine/1lm_engine.py", line 1404, in step
outputs = self.model_executor. execute_model(
File "/home/aac/apps/rocm611-0929/vllm-fix-spec-amd/vllm/executor/distributed_gpu_executor.py", line 78, in execu
driver_outputs = self ._ driver_execute_model(execute_model_req)
File "/home/aac/apps/rocm611-0929/vllm-fix-spec-amd/vllm/executor/multiproc_gpu_executor.py", line 155, in _drive
return self.driver_worker. execute_model(execute_model_req)
File "/home/aac/apps/rocm611-0929/vllm-fix-spec-amd/vllm/worker/worker_base.py", line 327, in execute_model
output = self.model_runner. execute_model(
File "/home/aac/anaconda3/envs/rocm611-0929/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in
return func(*args, ** kwargs)
File "/home/aac/apps/rocm611-0929/vllm-fix-spec-amd/vllm/worker/multi_step_model_runner.py", line 507, in execute
model_input = self ._ advance_step(
File "/home/aac/apps/rocm611-0929/vllm-fix-spec-amd/vllm/worker/multi_step_model_runner.py", line 634, in _advanc
attn_metadata.advance_step(
TypeError: advance_step() got an unexpected keyword argument 'turn prefills into decodes'
File "/home/aac/apps/rocm611-0929/vllm-fix-spec-amd/v1lm/worker/wo
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working