[Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig #18086

mgoin · 2025-05-13T15:59:58Z

Thanks to @chenyang78 for reporting and @heheda12345 for the fix.

The global vllm config may not be set by set_current_vllm_config, so we should read it from runner directly so self.vllm_config = runner.vllm_config. This is already what FlashInfer V0 does so this was likely just an oversight

vllm/vllm/attention/backends/flashinfer.py

Line 200 in 19324d6

self.vllm_config = self.runner.vllm_config

Before:

VLLM_USE_V1=1 VLLM_ATTENTION_BACKEND=FLASHINFER lm_eval --model vllm --model_args pretrained=meta-llama/Llama-3.1-8B-Instruct --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto 

assert len(per_layer_params) > 0, "No attention layers found in the model."

After:

VLLM_USE_V1=1 VLLM_ATTENTION_BACKEND=FLASHINFER lm_eval --model vllm --model_args pretrained=meta-llama/Llama-3.1-8B-Instruct --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto 

vllm (pretrained=meta-llama/Llama-3.1-8B-Instruct,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7892|±  |0.0112|
|     |       |strict-match    |     5|exact_match|↑  |0.7635|±  |0.0117|

Signed-off-by: mgoin <[email protected]>

github-actions · 2025-05-13T16:00:09Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

…lm-project#18086) Signed-off-by: Yuqi Zhang <[email protected]>

…lm-project#18086) Signed-off-by: minpeter <[email protected]>

Fix FlashInfer V1 backend using wrong VllmConfig

8884464

Signed-off-by: mgoin <[email protected]>

mgoin requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners May 13, 2025 15:59

mgoin added this to the v0.9.0 milestone May 13, 2025

mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed labels May 13, 2025

mergify bot added the v1 label May 13, 2025

This was referenced May 13, 2025

Update Dockerfile to build for Blackwell #18095

Merged

[CI] Fix Nightly Failures #17997

Closed

abmfy mentioned this pull request May 13, 2025

[Sampler] Adapt to FlashInfer 0.2.3 sampler API #15777

Merged

simon-mo merged commit 12e6c0b into vllm-project:main May 14, 2025
86 of 90 checks passed

mgoin deleted the fix-flashinfer-v1 branch May 14, 2025 14:04

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

[Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig (vl…

ae76946

…lm-project#18086) Signed-off-by: Yuqi Zhang <[email protected]>

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

[Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig (vl…

853ee1a

…lm-project#18086) Signed-off-by: minpeter <[email protected]>

tanujtiwari1998 mentioned this pull request Jul 8, 2025

cached tokens completions character-tech/vllm#22

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig #18086

[Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig #18086

mgoin commented May 13, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig #18086

[Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig #18086

Conversation

mgoin commented May 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

mgoin commented May 13, 2025 •

edited by github-actions bot

Loading