Skip to content

[BugFix] Avoid secondary error in ShmRingBuffer destructor #6530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 18, 2024

Conversation

njhill
Copy link
Member

@njhill njhill commented Jul 18, 2024

If there is an error setting up the shared memory in the ShmRingBuffer initializer, the destructor will raise an additional error:

[36m(RayWorkerWrapper pid=371, ip=10.128.21.6)[0m Exception ignored in: <function ShmRingBuffer.__del__ at 0x7f6ce44258a0>
[36m(RayWorkerWrapper pid=371, ip=10.128.21.6)[0m Traceback (most recent call last):
[36m(RayWorkerWrapper pid=371, ip=10.128.21.6)[0m   File "/opt/vllm/lib64/python3.11/site-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 122, in __del__
[36m(RayWorkerWrapper pid=371, ip=10.128.21.6)[0m     self.shared_memory.close()
[36m(RayWorkerWrapper pid=371, ip=10.128.21.6)[0m     ^^^^^^^^^^^^^^^^^^
[36m(RayWorkerWrapper pid=371, ip=10.128.21.6)[0m AttributeError: 'ShmRingBuffer' object has no attribute 'shared_memory'

If there is an error setting up the shared memory in the ShmRingBuffer initializer, the destructor will raise an additional error:

[36m(RayWorkerWrapper pid=371, ip=10.128.21.6)[0m Exception ignored in: <function ShmRingBuffer.__del__ at 0x7f6ce44258a0>
[36m(RayWorkerWrapper pid=371, ip=10.128.21.6)[0m Traceback (most recent call last):
[36m(RayWorkerWrapper pid=371, ip=10.128.21.6)[0m   File "/opt/vllm/lib64/python3.11/site-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 122, in __del__
[36m(RayWorkerWrapper pid=371, ip=10.128.21.6)[0m     self.shared_memory.close()
[36m(RayWorkerWrapper pid=371, ip=10.128.21.6)[0m     ^^^^^^^^^^^^^^^^^^
[36m(RayWorkerWrapper pid=371, ip=10.128.21.6)[0m AttributeError: 'ShmRingBuffer' object has no attribute 'shared_memory'
@njhill njhill requested a review from youkaichao July 18, 2024 01:21
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only trigger fastcheck CI to run, which consists only a small and essential subset of tests to quickly catch errors with the flexibility to run extra individual tests on top (you can do this by unblocking test steps in the Buildkite run).

Full CI run is still required to merge this PR so once the PR is ready to go, please make sure to run it. If you need all test signals in between PR commits, you can trigger full CI as well.

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@youkaichao youkaichao enabled auto-merge (squash) July 18, 2024 04:04
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 18, 2024
@youkaichao youkaichao disabled auto-merge July 18, 2024 05:24
@youkaichao youkaichao merged commit d25877d into main Jul 18, 2024
84 of 87 checks passed
@youkaichao youkaichao deleted the shm-destructor branch July 18, 2024 05:24
fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 19, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants