[BugFix] Fix Memory Leak #17567

robertgshaw2-redhat · 2025-05-02T00:00:30Z

SUMMARY:

recent PR ([V1][PP] Optimization: continue scheduling prefill chunks #17080) introduced a memory leak
this fixes it

Signed-off-by: [email protected] <[email protected]>

github-actions · 2025-05-02T00:00:39Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: [email protected] <[email protected]>

comaniac

Thanks for locating the memory leak and the fix!

comaniac · 2025-05-02T00:13:02Z

vllm/v1/core/sched/scheduler.py

+            # NOTE(rob): since we free stopped reqs above, adding stopped reqs
+            # to _cached_reqs_data will cause a memory leak.
+            if req_data.req_id not in stopped_set:
+                self._cached_reqs_data[req_data.req_id].append(req_data)


Can we just do the following so that we don't need to introduce stopped_set?

if req_data.req_id in self._cached_reqs_data: self._cached_reqs_data[req_data.req_id].append(req_data)

I don't think that would work unless line 541 is changed from self._cached_reqs_data.get(request.request_id) to self._cached_reqs_data[request.request_id].

Alternatively I think you could use if req_data.req_id not in self.finished_req_ids to avoid having a new set.

Signed-off-by: [email protected] <[email protected]>

nightflight-dk · 2025-05-03T20:04:04Z

any plans to fix in V0? batching with phi mini (3.5 and 4), lora_enabled - V0 showed degraded performance yesterday, responses showing signs of context from other prompts

nFunctor · 2025-05-05T11:34:26Z

@nightflight-dk sorry for a sidetrack, but I wanted to clarify since I don't understand the issue well here, do you mean that the chunked prefill, if enabled in V0, potentially leads to prompt mixing? There was a phenomenon like that for prefix caching due to hash function's behaviour in python 3.12 #12621 , but it got fixed for me in 0.7.2.

xiamuyingu · 2025-05-07T09:28:23Z

Is it means that if I upgrade vllm to v0.8.5.post1, and use option "--enable-prefix-caching" when start a model service, it will not have memory leak, right?

Signed-off-by: [email protected] <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Signed-off-by: [email protected] <[email protected]>

Syncing midstream NM fork to Upstream tag of [v0.8.5.post1](https://github.com/vllm-project/vllm/tree/v0.8.5.post1) + cherry pick of vllm-project@be633fb needed for benchmarks + [CP](neuralmagic/nm-vllm-ent@1fe447d) for compressed tensor bump + [CP](vllm-project#17677) for lora on AMD + [CP](vllm-project#17315) for llama4 w/ pure dense layers ``` commit 31c73ba (HEAD -> upstream-v0.8.5, nm-fork/upstream-v0.8.5) Author: Chauncey <[email protected]> Date: Wed Apr 30 15:11:04 2025 +0800 [Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' (vllm-project#17434) Signed-off-by: chaunceyjiang <[email protected]> commit f8db0bd Author: Lucas Wilkinson <[email protected]> Date: Fri May 2 14:01:38 2025 -0400 [BugFix][Attention] Fix sliding window attention in V1 giving incorrect results (vllm-project#17574) Signed-off-by: Lucas Wilkinson <[email protected]> commit e335c34 Author: Robert Shaw <[email protected]> Date: Fri May 2 04:07:03 2025 -0400 [BugFix] Fix Memory Leak (vllm-project#17567) Signed-off-by: [email protected] <[email protected]> commit cc463fe Merge: 1e358ff ba41cc9 Author: Selbi Nuryyeva <[email protected]> Date: Tue Apr 29 12:34:57 2025 -0400 Merge branch 'tag-upstream-v0.8.5' into upstream-v0.8.5 commit ba41cc9 (tag: v0.8.5, tag-upstream-v0.8.5) Author: Michael Goin <[email protected]> Date: Mon Apr 28 16:20:24 2025 -0600 [Model] Add tuned triton fused_moe configs for Qwen3Moe (vllm-project#17328) Signed-off-by: mgoin <[email protected]> commit dcbac4c Author: Simon Mo <[email protected]> Date: Mon Apr 28 14:12:01 2025 -0700 [Model] Qwen3 Dense FP8 Compat Fixes (vllm-project#17318) Signed-off-by: simon-mo <[email protected]> [...] ``` Commands ``` git fetch upstream git checkout -b upstream-v0.8.5 git merge upstream/v0.8.5 git cherry-pick be633fb ``` TEST PLAN accept sync: https://github.com/neuralmagic/nm-cicd/actions/runs/14841223552 related PR in cicd: neuralmagic/nm-cicd#99 release workflow: https://github.com/neuralmagic/nm-cicd/actions/runs/14845693864

Signed-off-by: [email protected] <[email protected]>

robertgshaw2-redhat · 2025-05-21T21:52:26Z

V0

This issue does not impact V0. Its related to the V1 scheduler

Signed-off-by: [email protected] <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

Signed-off-by: [email protected] <[email protected]> Signed-off-by: minpeter <[email protected]>

robertgshaw2-redhat added 2 commits May 1, 2025 23:33

updated

1e4e107

Signed-off-by: [email protected] <[email protected]>

updated

1d9f88e

Signed-off-by: [email protected] <[email protected]>

robertgshaw2-redhat requested review from WoosukKwon, njhill, ywang96, comaniac and alexm-redhat as code owners May 2, 2025 00:00

updated

c00f3e7

Signed-off-by: [email protected] <[email protected]>

mergify bot added the v1 label May 2, 2025

comaniac reviewed May 2, 2025

View reviewed changes

address nicks comment

bc53d27

Signed-off-by: [email protected] <[email protected]>

njhill approved these changes May 2, 2025

View reviewed changes

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label May 2, 2025

vllm-bot merged commit c777df7 into vllm-project:main May 2, 2025
66 of 68 checks passed

radeksm pushed a commit to radeksm/vllm that referenced this pull request May 2, 2025

[BugFix] Fix Memory Leak (vllm-project#17567)

1434b45

Signed-off-by: [email protected] <[email protected]>

simon-mo pushed a commit that referenced this pull request May 2, 2025

[BugFix] Fix Memory Leak (#17567)

edb5286

Signed-off-by: [email protected] <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[BugFix] Fix Memory Leak (vllm-project#17567)

7e318ba

Signed-off-by: [email protected] <[email protected]> Signed-off-by: Mu Huai <[email protected]>

dtrifiro pushed a commit to red-hat-data-services/vllm that referenced this pull request May 13, 2025

[BugFix] Fix Memory Leak (vllm-project#17567)

e335c34

Signed-off-by: [email protected] <[email protected]>

mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025

[BugFix] Fix Memory Leak (vllm-project#17567)

0570648

Signed-off-by: [email protected] <[email protected]>

ckhordiasma mentioned this pull request May 14, 2025

nm vllm ent 0.8.5 sync red-hat-data-services/vllm#139

Merged

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

[BugFix] Fix Memory Leak (vllm-project#17567)

7828ce2

Signed-off-by: [email protected] <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

[BugFix] Fix Memory Leak (vllm-project#17567)

2f3cc4d

Signed-off-by: [email protected] <[email protected]> Signed-off-by: minpeter <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix] Fix Memory Leak #17567

[BugFix] Fix Memory Leak #17567

robertgshaw2-redhat commented May 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 2, 2025

Uh oh!

comaniac left a comment

Uh oh!

comaniac May 2, 2025

Uh oh!

njhill May 2, 2025

Uh oh!

robertgshaw2-redhat May 2, 2025

Uh oh!

Uh oh!

nightflight-dk commented May 3, 2025

Uh oh!

nFunctor commented May 5, 2025

Uh oh!

xiamuyingu commented May 7, 2025 •

edited

Loading

Uh oh!

robertgshaw2-redhat commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!

[BugFix] Fix Memory Leak #17567

[BugFix] Fix Memory Leak #17567

Conversation

robertgshaw2-redhat commented May 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 2, 2025

Uh oh!

comaniac left a comment

Choose a reason for hiding this comment

Uh oh!

comaniac May 2, 2025

Choose a reason for hiding this comment

Uh oh!

njhill May 2, 2025

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat May 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nightflight-dk commented May 3, 2025

Uh oh!

nFunctor commented May 5, 2025

Uh oh!

xiamuyingu commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robertgshaw2-redhat commented May 21, 2025

Uh oh!

Uh oh!

robertgshaw2-redhat commented May 2, 2025 •

edited by github-actions bot

Loading

xiamuyingu commented May 7, 2025 •

edited

Loading