[Bug][Failing Test]: weight-loading-multiple-gpu-test -

### Your current environment

Still failing on main as of commit bca55b556f

### 🐛 Describe the bug

Failing test: https://buildkite.com/organizations/vllm/analytics/suites/ci-1/tests/6873a23f-c2ec-8c01-9e20-bac3329482c0?tags=scm.branch%3Amain%2Cresult%3Afailed

```
FAILED weight_loading/test_weight_loading.py::test_weight_loading - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_0': 1}
```

<details>
<summary>Logs</summary>

```
[2025-05-20T10:46:52Z] (VllmWorker rank=0 pid=12189) INFO 05-20 03:46:52 [backends.py:172] Compiling a graph for general shape takes 20.71 s
[2025-05-20T10:46:52Z] (VllmWorker rank=0 pid=12189) DEBUG 05-20 03:46:52 [backends.py:512] Computation graph saved to /root/.cache/vllm/torch_compile_cache/07e0a984e7/rank_0_0/computation_graph.py
[2025-05-20T10:46:55Z] (VllmWorker rank=1 pid=12191) DEBUG 05-20 03:46:55 [wrapper.py:105] Dynamo transformed code saved to /root/.cache/vllm/torch_compile_cache/07e0a984e7/rank_1_0/transformed_code.py
[2025-05-20T10:46:55Z] (VllmWorker rank=0 pid=12189) DEBUG 05-20 03:46:55 [wrapper.py:105] Dynamo transformed code saved to /root/.cache/vllm/torch_compile_cache/07e0a984e7/rank_0_0/transformed_code.py
[2025-05-20T10:46:57Z] DEBUG 05-20 03:46:57 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:47:04Z] (VllmWorker rank=1 pid=12191) INFO 05-20 03:47:04 [monitor.py:33] torch.compile takes 26.93 s in total
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) INFO 05-20 03:47:04 [monitor.py:33] torch.compile takes 27.04 s in total
[2025-05-20T10:47:04Z] [rank0]:[E520 03:47:04.619668070 ProcessGroupNCCL.cpp:1896] [PG ID 2 PG GUID 3 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
[2025-05-20T10:47:04Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T10:47:04Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T10:47:04Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
[2025-05-20T10:47:04Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:47:04Z] frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xe0 (0x7f813290d4a2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:47:04Z] frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c2 (0x7f8132d26422 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
[2025-05-20T10:47:04Z] frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f80c268b456 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x70 (0x7f80c269b6f0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x782 (0x7f80c269d282 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f80c269ee8d in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #7: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:47:04Z] frame #8: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:47:04Z] frame #9: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] terminate called after throwing an instance of 'c10::DistBackendError'
[2025-05-20T10:47:04Z]   what():  [PG ID 2 PG GUID 3 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
[2025-05-20T10:47:04Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T10:47:04Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T10:47:04Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
[2025-05-20T10:47:04Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:47:04Z] frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xe0 (0x7f813290d4a2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:47:04Z] frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c2 (0x7f8132d26422 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
[2025-05-20T10:47:04Z] frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f80c268b456 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x70 (0x7f80c269b6f0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x782 (0x7f80c269d282 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f80c269ee8d in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #7: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:47:04Z] frame #8: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:47:04Z] frame #9: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Exception raised from ncclCommWatchdog at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1902 (most recent call first):
[2025-05-20T10:47:04Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:47:04Z] frame #1: <unknown function> + 0xcc7a4e (0x7f80c266da4e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #2: <unknown function> + 0x9165ed (0x7f80c22bc5ed in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #3: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:47:04Z] frame #4: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:47:04Z] frame #5: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Fatal Python error: Aborted
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Thread 0x00007f814f5da640 (most recent call first):
[2025-05-20T10:47:04Z]   File "/usr/lib/python3.12/threading.py", line 359 in wait
[2025-05-20T10:47:04Z]   File "/usr/lib/python3.12/threading.py", line 655 in wait
[2025-05-20T10:47:04Z]   File "/usr/local/lib/python3.12/dist-packages/tqdm/_monitor.py", line 60 in run
[2025-05-20T10:47:04Z]   File "/usr/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
[2025-05-20T10:47:04Z]   File "/usr/lib/python3.12/threading.py", line 1032 in _bootstrap
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Thread 0x00007f81549dc640 (most recent call first):
[2025-05-20T10:47:04Z]   File "/usr/lib/python3.12/threading.py", line 359 in wait
[2025-05-20T10:47:04Z]   File "/usr/lib/python3.12/threading.py", line 655 in wait
[2025-05-20T10:47:04Z]   File "/usr/local/lib/python3.12/dist-packages/tqdm/_monitor.py", line 60 in run
[2025-05-20T10:47:04Z]   File "/usr/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
[2025-05-20T10:47:04Z]   File "/usr/lib/python3.12/threading.py", line 1032 in _bootstrap
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Thread 0x00007f8166fe3640 (most recent call first):
[2025-05-20T10:47:04Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/usage/usage_lib.py", line 229 in _report_continuous_usage
[2025-05-20T10:47:04Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/usage/usage_lib.py", line 164 in _report_usage_worker
[2025-05-20T10:47:04Z]   File "/usr/lib/python3.12/threading.py", line 1012 in run
[2025-05-20T10:47:04Z]   File "/usr/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
[2025-05-20T10:47:04Z]   File "/usr/lib/python3.12/threading.py", line 1032 in _bootstrap
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Thread 0x00007f81b3f24000 (most recent call first):
[2025-05-20T10:47:04Z]   File "/usr/lib/python3.12/logging/__init__.py", line 720 in format
[2025-05-20T10:47:04Z]   File (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] WorkerProc hit an exception.
[2025-05-20T10:47:04Z] Fatal Python error: Segmentation fault
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] Traceback (most recent call last):
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     output = func(*args, **kwargs)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]              ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     return func(*args, **kwargs)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 185, in determine_available_memory
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     self.model_runner.profile_run()
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1856, in profile_run
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     sampler_output = self._dummy_sampler_run(hidden_states)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     return func(*args, **kwargs)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1757, in _dummy_sampler_run
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     raise e
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1747, in _dummy_sampler_run
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     sampler_output = self.sampler(logits=logits,
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 49, in forward
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     sampled = self.sample(logits, sampling_metadata)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 125, in sample
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     sampled = torch.where(
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]               ^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Extension modules: (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T10:47:04Z] zstandard.backend_c(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] Traceback (most recent call last):
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
[2025-05-20T10:47:04Z] , (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     output = func(*args, **kwargs)
[2025-05-20T10:47:04Z] charset_normalizer.md(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]              ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     return func(*args, **kwargs)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 185, in determine_available_memory
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     self.model_runner.profile_run()
[2025-05-20T10:47:04Z] , regex._regex(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1856, in profile_run
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     sampler_output = self._dummy_sampler_run(hidden_states)
[2025-05-20T10:47:04Z] , (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] numpy.core._multiarray_umath(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     return func(*args, **kwargs)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] , numpy.core._multiarray_tests(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1757, in _dummy_sampler_run
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     raise e
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1747, in _dummy_sampler_run
[2025-05-20T10:47:04Z] , numpy.linalg._umath_linalg(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     sampler_output = self.sampler(logits=logits,
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[2025-05-20T10:47:04Z] , numpy.fft._pocketfft_internal(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
[2025-05-20T10:47:04Z] , numpy.random._common(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[2025-05-20T10:47:04Z] , numpy.random.bit_generator(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
[2025-05-20T10:47:04Z] , (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] numpy.random._bounded_integers(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 49, in forward
[2025-05-20T10:47:04Z] , numpy.random._mt19937(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     sampled = self.sample(logits, sampling_metadata)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] , (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 125, in sample
[2025-05-20T10:47:04Z] numpy.random.mtrand(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]     sampled = torch.where(
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]               ^^^^^^^^^^^^
[2025-05-20T10:47:04Z] , (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-20T10:47:04Z] numpy.random._philox(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T10:47:04Z] , numpy.random._pcg64(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T10:47:04Z] , numpy.random._sfc64(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]
[2025-05-20T10:47:04Z] , numpy.random._generator, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, yaml._yaml, PIL._imaging, markupsafe._speedups, sklearn.__check_build._check_build, psutil._psutil_linux, psutil._psutil_posix, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._cdflib, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.stats._unuran.unuran_wrapper, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, numexpr.interpreter, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, zmq.backend.cython._zmq, PIL._imagingft, hiredis.hiredis, msgspec._core, _cffi_backend, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._helpers, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket, frozenlist._frozenlist, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, sentencepiece._sentencepiece, vllm.cumem_allocator, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.typing.builtins.itertools, numba.cpython.builtins.math, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, cuda_utils, __triton_launcher (total: 231)
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] EngineCore failed to start.
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] Traceback (most recent call last):
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 480, in run_engine_core
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]     engine_core = EngineCoreProc(*args, **kwargs)
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 379, in __init__
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]     super().__init__(vllm_config, executor_class, log_stats,
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 74, in __init__
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]     self._initialize_kv_caches(vllm_config)
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 133, in _initialize_kv_caches
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]     available_gpu_memory = self.model_executor.determine_available_memory()
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in determine_available_memory
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]     output = self.collective_rpc("determine_available_memory")
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]     result = get_response(w, dequeue_timeout)
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 202, in get_response
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489]     raise RuntimeError(
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] RuntimeError: Worker failed with error 'CUDA error: an illegal memory access was encountered
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] ', please check the stack trace above for the root cause
[2025-05-20T10:47:07Z] DEBUG 05-20 03:47:07 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:47:17Z] DEBUG 05-20 03:47:17 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:47:27Z] DEBUG 05-20 03:47:27 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:47:37Z] DEBUG 05-20 03:47:37 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:47:47Z] DEBUG 05-20 03:47:47 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:47:57Z] DEBUG 05-20 03:47:57 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:48:04Z] (VllmWorker rank=1 pid=12191) DEBUG 05-20 03:48:04 [shm_broadcast.py:430] No available shared memory broadcast block found in 60 second.
[2025-05-20T10:48:08Z] DEBUG 05-20 03:48:08 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:48:18Z] DEBUG 05-20 03:48:18 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:48:28Z] DEBUG 05-20 03:48:28 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:48:38Z] DEBUG 05-20 03:48:38 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:48:48Z] DEBUG 05-20 03:48:48 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:48:58Z] DEBUG 05-20 03:48:58 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:49:04Z] (VllmWorker rank=1 pid=12191) DEBUG 05-20 03:49:04 [shm_broadcast.py:430] No available shared memory broadcast block found in 60 second.
[2025-05-20T10:49:08Z] DEBUG 05-20 03:49:08 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:49:18Z] DEBUG 05-20 03:49:18 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:49:28Z] DEBUG 05-20 03:49:28 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:49:38Z] DEBUG 05-20 03:49:38 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:49:48Z] DEBUG 05-20 03:49:48 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:49:58Z] DEBUG 05-20 03:49:58 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:50:04Z] (VllmWorker rank=1 pid=12191) DEBUG 05-20 03:50:04 [shm_broadcast.py:430] No available shared memory broadcast block found in 60 second.
[2025-05-20T10:50:08Z] DEBUG 05-20 03:50:08 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:50:18Z] DEBUG 05-20 03:50:18 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:50:28Z] DEBUG 05-20 03:50:28 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:50:38Z] DEBUG 05-20 03:50:38 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:50:48Z] DEBUG 05-20 03:50:48 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:50:58Z] DEBUG 05-20 03:50:58 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:51:04Z] (VllmWorker rank=1 pid=12191) DEBUG 05-20 03:51:04 [shm_broadcast.py:430] No available shared memory broadcast block found in 60 second.
[2025-05-20T10:51:08Z] DEBUG 05-20 03:51:08 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:51:18Z] DEBUG 05-20 03:51:18 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:51:28Z] DEBUG 05-20 03:51:28 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:51:38Z] DEBUG 05-20 03:51:38 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:51:48Z] DEBUG 05-20 03:51:48 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:51:58Z] DEBUG 05-20 03:51:58 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:52:04Z] (VllmWorker rank=1 pid=12191) DEBUG 05-20 03:52:04 [shm_broadcast.py:430] No available shared memory broadcast block found in 60 second.
[2025-05-20T10:52:05Z] [rank1]:[W520 03:52:05.308296965 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=135, addr=[localhost]:59310, remote=[localhost]:44713): Connection reset by peer
[2025-05-20T10:52:05Z] Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:675 (most recent call first):
[2025-05-20T10:52:05Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:52:05Z] frame #1: <unknown function> + 0x5ba8afe (0x7f8116a3cafe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:05Z] frame #2: <unknown function> + 0x5baaecf (0x7f8116a3eecf in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:05Z] frame #3: <unknown function> + 0x5bab74a (0x7f8116a3f74a in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:05Z] frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x2a9 (0x7f8116a391a9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:05Z] frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7f80c2699989 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:52:05Z] frame #6: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:52:05Z] frame #7: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:05Z] frame #8: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:05Z]
[2025-05-20T10:52:05Z] [rank1]:[W520 03:52:05.311880662 ProcessGroupNCCL.cpp:1659] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Connection reset by peer
[2025-05-20T10:52:05Z] ERROR 05-20 03:52:05 [multiproc_executor.py:135] Worker proc VllmWorker-0 died unexpectedly, shutting down executor.
[2025-05-20T10:52:06Z] [rank1]:[W520 03:52:06.312051397 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=135, addr=[localhost]:59310, remote=[localhost]:44713): Broken pipe
[2025-05-20T10:52:06Z] Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
[2025-05-20T10:52:06Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:52:06Z] frame #1: <unknown function> + 0x5ba8afe (0x7f8116a3cafe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:06Z] frame #2: <unknown function> + 0x5baa358 (0x7f8116a3e358 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:06Z] frame #3: <unknown function> + 0x5babb3e (0x7f8116a3fb3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:06Z] frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7f8116a39198 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:06Z] frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7f80c2699989 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:52:06Z] frame #6: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:52:06Z] frame #7: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:06Z] frame #8: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:06Z]
[2025-05-20T10:52:06Z] [rank1]:[W520 03:52:06.315138190 ProcessGroupNCCL.cpp:1659] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[2025-05-20T10:52:07Z] [rank1]:[W520 03:52:07.315243724 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=135, addr=[localhost]:59310, remote=[localhost]:44713): Broken pipe
[2025-05-20T10:52:07Z] Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
[2025-05-20T10:52:07Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:52:07Z] frame #1: <unknown function> + 0x5ba8afe (0x7f8116a3cafe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:07Z] frame #2: <unknown function> + 0x5baa358 (0x7f8116a3e358 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:07Z] frame #3: <unknown function> + 0x5babb3e (0x7f8116a3fb3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:07Z] frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7f8116a39198 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:07Z] frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7f80c2699989 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:52:07Z] frame #6: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:52:07Z] frame #7: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:07Z] frame #8: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:07Z]
[2025-05-20T10:52:07Z] [rank1]:[W520 03:52:07.317919386 ProcessGroupNCCL.cpp:1659] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[2025-05-20T10:52:08Z] DEBUG 05-20 03:52:08 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:52:08Z] [rank1]:[W520 03:52:08.318060510 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=135, addr=[localhost]:59310, remote=[localhost]:44713): Broken pipe
[2025-05-20T10:52:08Z] Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
[2025-05-20T10:52:08Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:52:08Z] frame #1: <unknown function> + 0x5ba8afe (0x7f8116a3cafe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:08Z] frame #2: <unknown function> + 0x5baa358 (0x7f8116a3e358 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:08Z] frame #3: <unknown function> + 0x5babb3e (0x7f8116a3fb3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:08Z] frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7f8116a39198 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:08Z] frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7f80c2699989 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:52:08Z] frame #6: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:52:08Z] frame #7: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:08Z] frame #8: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:08Z]
[2025-05-20T10:52:08Z] [rank1]:[W520 03:52:08.321126013 ProcessGroupNCCL.cpp:1659] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[2025-05-20T10:52:09Z] [rank1]:[W520 03:52:09.321236826 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=135, addr=[localhost]:59310, remote=[localhost]:44713): Broken pipe
[2025-05-20T10:52:09Z] Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
[2025-05-20T10:52:09Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:52:09Z] frame #1: <unknown function> + 0x5ba8afe (0x7f8116a3cafe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:09Z] frame #2: <unknown function> + 0x5baa358 (0x7f8116a3e358 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:09Z] frame #3: <unknown function> + 0x5babb3e (0x7f8116a3fb3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:09Z] frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7f8116a39198 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:09Z] frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7f80c2699989 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:52:09Z] frame #6: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:52:09Z] frame #7: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:09Z] frame #8: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:09Z]
[2025-05-20T10:52:09Z] [rank1]:[W520 03:52:09.323951540 ProcessGroupNCCL.cpp:1659] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[2025-05-20T10:52:09Z] Process EngineCore_0:
[2025-05-20T10:52:09Z] Traceback (most recent call last):
[2025-05-20T10:52:09Z]   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
[2025-05-20T10:52:09Z]     self.run()
[2025-05-20T10:52:09Z]   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
[2025-05-20T10:52:09Z]     self._target(*self._args, **self._kwargs)
[2025-05-20T10:52:09Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 493, in run_engine_core
[2025-05-20T10:52:09Z]     raise e
[2025-05-20T10:52:09Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 480, in run_engine_core
[2025-05-20T10:52:09Z]     engine_core = EngineCoreProc(*args, **kwargs)
[2025-05-20T10:52:09Z]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:52:09Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 379, in __init__
[2025-05-20T10:52:09Z]     super().__init__(vllm_config, executor_class, log_stats,
[2025-05-20T10:52:09Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 74, in __init__
[2025-05-20T10:52:09Z]     self._initialize_kv_caches(vllm_config)
[2025-05-20T10:52:09Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 133, in _initialize_kv_caches
[2025-05-20T10:52:09Z]     available_gpu_memory = self.model_executor.determine_available_memory()
[2025-05-20T10:52:09Z]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:52:09Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in determine_available_memory
[2025-05-20T10:52:09Z]     output = self.collective_rpc("determine_available_memory")
[2025-05-20T10:52:09Z]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:52:09Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc
[2025-05-20T10:52:09Z]     result = get_response(w, dequeue_timeout)
[2025-05-20T10:52:09Z]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:52:09Z]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 202, in get_response
[2025-05-20T10:52:09Z]     raise RuntimeError(
[2025-05-20T10:52:09Z] RuntimeError: Worker failed with error 'CUDA error: an illegal memory access was encountered
[2025-05-20T10:52:09Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T10:52:09Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T10:52:09Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T10:52:09Z] ', please check the stack trace above for the root cause
[2025-05-20T10:52:10Z] /usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 3 leaked shared_memory objects to clean up at shutdown
[2025-05-20T10:52:10Z]   warnings.warn('resource_tracker: There appear to be %d '
[2025-05-20T10:52:10Z] F
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] =================================== FAILURES ===================================
[2025-05-20T10:52:10Z] _____________________________ test_weight_loading ______________________________
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] vllm_runner = <class 'tests.conftest.VllmRunner'>
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z]     @pytest.mark.skipif(
[2025-05-20T10:52:10Z]         MODEL_NAME == "casperhansen/deepseek-coder-v2-instruct-awq",
[2025-05-20T10:52:10Z]         reason="OOM in the CI")
[2025-05-20T10:52:10Z]     @pytest.mark.skipif(
[2025-05-20T10:52:10Z]         not current_platform.has_device_capability(int(MIN_CAPABILITY)),
[2025-05-20T10:52:10Z]         reason="Current system does not have minimum capability.")
[2025-05-20T10:52:10Z]     def test_weight_loading(vllm_runner):
[2025-05-20T10:52:10Z]         """
[2025-05-20T10:52:10Z]         Test parameter weight loading with tp>1.
[2025-05-20T10:52:10Z]         """
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z]         # MoE models need fp16.
[2025-05-20T10:52:10Z]         NEEDS_FP16 = (QUANTIZATION == "gptq" or MODEL_NAME
[2025-05-20T10:52:10Z]                       == "nm-testing/test-w4a16-mixtral-actorder-group")
[2025-05-20T10:52:10Z] >       with vllm_runner(
[2025-05-20T10:52:10Z]                 model_name=MODEL_NAME,
[2025-05-20T10:52:10Z]                 revision=REVISION,
[2025-05-20T10:52:10Z]                 dtype=torch.half if NEEDS_FP16 else "auto",
[2025-05-20T10:52:10Z]                 quantization=None if QUANTIZATION == "None" else QUANTIZATION,
[2025-05-20T10:52:10Z]                 max_model_len=MAX_MODEL_LEN,
[2025-05-20T10:52:10Z]                 tensor_parallel_size=2) as model:
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] weight_loading/test_weight_loading.py:32:
[2025-05-20T10:52:10Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2025-05-20T10:52:10Z] conftest.py:762: in __init__
[2025-05-20T10:52:10Z]     self.model = LLM(
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/utils.py:1177: in inner
[2025-05-20T10:52:10Z]     return fn(*args, **kwargs)
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:250: in __init__
[2025-05-20T10:52:10Z]     self.llm_engine = LLMEngine.from_engine_args(
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py:511: in from_engine_args
[2025-05-20T10:52:10Z]     return engine_cls.from_vllm_config(
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:115: in from_vllm_config
[2025-05-20T10:52:10Z]     return cls(vllm_config=vllm_config,
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:92: in __init__
[2025-05-20T10:52:10Z]     self.engine_core = EngineCoreClient.make_client(
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:75: in make_client
[2025-05-20T10:52:10Z]     return SyncMPClient(vllm_config, executor_class, log_stats)
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:580: in __init__
[2025-05-20T10:52:10Z]     super().__init__(
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:418: in __init__
[2025-05-20T10:52:10Z]     self._wait_for_engine_startup(output_address, parallel_config)
[2025-05-20T10:52:10Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] self = <vllm.v1.engine.core_client.SyncMPClient object at 0x7f7f5ea652e0>
[2025-05-20T10:52:10Z] output_address = 'ipc:///tmp/d003abd2-4e16-42b2-9050-bf1e9dc8d357'
[2025-05-20T10:52:10Z] parallel_config = ParallelConfig(pipeline_parallel_size=1, tensor_parallel_size=2, data_parallel_size=1, data_parallel_size_local=1, dat...p', worker_cls='vllm.v1.worker.gpu_worker.Worker', sd_worker_cls='auto', worker_extension_cls='', world_size=2, rank=0)
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z]     def _wait_for_engine_startup(self, output_address: str,
[2025-05-20T10:52:10Z]                                  parallel_config: ParallelConfig):
[2025-05-20T10:52:10Z]         # Get a sync handle to the socket which can be sync or async.
[2025-05-20T10:52:10Z]         sync_input_socket = zmq.Socket.shadow(self.input_socket)
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z]         # Wait for engine core process(es) to send ready messages.
[2025-05-20T10:52:10Z]         local_count = parallel_config.data_parallel_size_local
[2025-05-20T10:52:10Z]         remote_count = len(self.core_engines) - local_count
[2025-05-20T10:52:10Z]         # [local, remote] counts
[2025-05-20T10:52:10Z]         conn_pending, start_pending = [local_count, remote_count], [0, 0]
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z]         poller = zmq.Poller()
[2025-05-20T10:52:10Z]         poller.register(sync_input_socket, zmq.POLLIN)
[2025-05-20T10:52:10Z]         proc_manager = self.resources.local_engine_manager
[2025-05-20T10:52:10Z]         if proc_manager is not None:
[2025-05-20T10:52:10Z]             for sentinel in proc_manager.sentinels():
[2025-05-20T10:52:10Z]                 poller.register(sentinel, zmq.POLLIN)
[2025-05-20T10:52:10Z]         while any(conn_pending) or any(start_pending):
[2025-05-20T10:52:10Z]             events = poller.poll(STARTUP_POLL_PERIOD_MS)
[2025-05-20T10:52:10Z]             if not events:
[2025-05-20T10:52:10Z]                 if any(conn_pending):
[2025-05-20T10:52:10Z]                     logger.debug(
[2025-05-20T10:52:10Z]                         "Waiting for %d local, %d remote core engine proc(s) "
[2025-05-20T10:52:10Z]                         "to connect.", *conn_pending)
[2025-05-20T10:52:10Z]                 if any(start_pending):
[2025-05-20T10:52:10Z]                     logger.debug(
[2025-05-20T10:52:10Z]                         "Waiting for %d local, %d remote core engine proc(s) "
[2025-05-20T10:52:10Z]                         "to start.", *start_pending)
[2025-05-20T10:52:10Z]                 continue
[2025-05-20T10:52:10Z]             if len(events) > 1 or events[0][0] != sync_input_socket:
[2025-05-20T10:52:10Z]                 # One of the local core processes exited.
[2025-05-20T10:52:10Z]                 finished = proc_manager.finished_procs(
[2025-05-20T10:52:10Z]                 ) if proc_manager else {}
[2025-05-20T10:52:10Z] >               raise RuntimeError("Engine core initialization failed. "
[2025-05-20T10:52:10Z]                                    "See root cause above. "
[2025-05-20T10:52:10Z]                                    f"Failed core proc(s): {finished}")
[2025-05-20T10:52:10Z] E               RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_0': 1}
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:484: RuntimeError
```

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug][Failing Test]: weight-loading-multiple-gpu-test - #18416

Your current environment

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug][Failing Test]: weight-loading-multiple-gpu-test - #18416

Description

Your current environment

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions