-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Labels
bugSomething isn't workingSomething isn't workingci-failureIssue about an unexpected test failure in CIIssue about an unexpected test failure in CI
Description
Your current environment
Still failing on main as of commit bca55b5
🐛 Describe the bug
FAILED weight_loading/test_weight_loading.py::test_weight_loading - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_0': 1}
Logs
[2025-05-20T10:46:52Z] (VllmWorker rank=0 pid=12189) INFO 05-20 03:46:52 [backends.py:172] Compiling a graph for general shape takes 20.71 s
[2025-05-20T10:46:52Z] (VllmWorker rank=0 pid=12189) DEBUG 05-20 03:46:52 [backends.py:512] Computation graph saved to /root/.cache/vllm/torch_compile_cache/07e0a984e7/rank_0_0/computation_graph.py
[2025-05-20T10:46:55Z] (VllmWorker rank=1 pid=12191) DEBUG 05-20 03:46:55 [wrapper.py:105] Dynamo transformed code saved to /root/.cache/vllm/torch_compile_cache/07e0a984e7/rank_1_0/transformed_code.py
[2025-05-20T10:46:55Z] (VllmWorker rank=0 pid=12189) DEBUG 05-20 03:46:55 [wrapper.py:105] Dynamo transformed code saved to /root/.cache/vllm/torch_compile_cache/07e0a984e7/rank_0_0/transformed_code.py
[2025-05-20T10:46:57Z] DEBUG 05-20 03:46:57 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:47:04Z] (VllmWorker rank=1 pid=12191) INFO 05-20 03:47:04 [monitor.py:33] torch.compile takes 26.93 s in total
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) INFO 05-20 03:47:04 [monitor.py:33] torch.compile takes 27.04 s in total
[2025-05-20T10:47:04Z] [rank0]:[E520 03:47:04.619668070 ProcessGroupNCCL.cpp:1896] [PG ID 2 PG GUID 3 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
[2025-05-20T10:47:04Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T10:47:04Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T10:47:04Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
[2025-05-20T10:47:04Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:47:04Z] frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xe0 (0x7f813290d4a2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:47:04Z] frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c2 (0x7f8132d26422 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
[2025-05-20T10:47:04Z] frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f80c268b456 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x70 (0x7f80c269b6f0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x782 (0x7f80c269d282 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f80c269ee8d in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #7: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:47:04Z] frame #8: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:47:04Z] frame #9: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] terminate called after throwing an instance of 'c10::DistBackendError'
[2025-05-20T10:47:04Z] what(): [PG ID 2 PG GUID 3 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
[2025-05-20T10:47:04Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T10:47:04Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T10:47:04Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
[2025-05-20T10:47:04Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:47:04Z] frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xe0 (0x7f813290d4a2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:47:04Z] frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c2 (0x7f8132d26422 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
[2025-05-20T10:47:04Z] frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f80c268b456 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x70 (0x7f80c269b6f0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x782 (0x7f80c269d282 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f80c269ee8d in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #7: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:47:04Z] frame #8: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:47:04Z] frame #9: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Exception raised from ncclCommWatchdog at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1902 (most recent call first):
[2025-05-20T10:47:04Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:47:04Z] frame #1: <unknown function> + 0xcc7a4e (0x7f80c266da4e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #2: <unknown function> + 0x9165ed (0x7f80c22bc5ed in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:47:04Z] frame #3: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:47:04Z] frame #4: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:47:04Z] frame #5: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Fatal Python error: Aborted
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Thread 0x00007f814f5da640 (most recent call first):
[2025-05-20T10:47:04Z] File "/usr/lib/python3.12/threading.py", line 359 in wait
[2025-05-20T10:47:04Z] File "/usr/lib/python3.12/threading.py", line 655 in wait
[2025-05-20T10:47:04Z] File "/usr/local/lib/python3.12/dist-packages/tqdm/_monitor.py", line 60 in run
[2025-05-20T10:47:04Z] File "/usr/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
[2025-05-20T10:47:04Z] File "/usr/lib/python3.12/threading.py", line 1032 in _bootstrap
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Thread 0x00007f81549dc640 (most recent call first):
[2025-05-20T10:47:04Z] File "/usr/lib/python3.12/threading.py", line 359 in wait
[2025-05-20T10:47:04Z] File "/usr/lib/python3.12/threading.py", line 655 in wait
[2025-05-20T10:47:04Z] File "/usr/local/lib/python3.12/dist-packages/tqdm/_monitor.py", line 60 in run
[2025-05-20T10:47:04Z] File "/usr/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
[2025-05-20T10:47:04Z] File "/usr/lib/python3.12/threading.py", line 1032 in _bootstrap
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Thread 0x00007f8166fe3640 (most recent call first):
[2025-05-20T10:47:04Z] File "/usr/local/lib/python3.12/dist-packages/vllm/usage/usage_lib.py", line 229 in _report_continuous_usage
[2025-05-20T10:47:04Z] File "/usr/local/lib/python3.12/dist-packages/vllm/usage/usage_lib.py", line 164 in _report_usage_worker
[2025-05-20T10:47:04Z] File "/usr/lib/python3.12/threading.py", line 1012 in run
[2025-05-20T10:47:04Z] File "/usr/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
[2025-05-20T10:47:04Z] File "/usr/lib/python3.12/threading.py", line 1032 in _bootstrap
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Thread 0x00007f81b3f24000 (most recent call first):
[2025-05-20T10:47:04Z] File "/usr/lib/python3.12/logging/__init__.py", line 720 in format
[2025-05-20T10:47:04Z] File (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] WorkerProc hit an exception.
[2025-05-20T10:47:04Z] Fatal Python error: Segmentation fault
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] Traceback (most recent call last):
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] output = func(*args, **kwargs)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] return func(*args, **kwargs)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 185, in determine_available_memory
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] self.model_runner.profile_run()
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1856, in profile_run
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] sampler_output = self._dummy_sampler_run(hidden_states)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] return func(*args, **kwargs)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1757, in _dummy_sampler_run
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] raise e
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1747, in _dummy_sampler_run
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] sampler_output = self.sampler(logits=logits,
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 49, in forward
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] sampled = self.sample(logits, sampling_metadata)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 125, in sample
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] sampled = torch.where(
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-20T10:47:04Z]
[2025-05-20T10:47:04Z] Extension modules: (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T10:47:04Z] zstandard.backend_c(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] Traceback (most recent call last):
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
[2025-05-20T10:47:04Z] , (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] output = func(*args, **kwargs)
[2025-05-20T10:47:04Z] charset_normalizer.md(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] return func(*args, **kwargs)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 185, in determine_available_memory
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] self.model_runner.profile_run()
[2025-05-20T10:47:04Z] , regex._regex(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1856, in profile_run
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] sampler_output = self._dummy_sampler_run(hidden_states)
[2025-05-20T10:47:04Z] , (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] numpy.core._multiarray_umath(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] return func(*args, **kwargs)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] , numpy.core._multiarray_tests(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1757, in _dummy_sampler_run
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] raise e
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1747, in _dummy_sampler_run
[2025-05-20T10:47:04Z] , numpy.linalg._umath_linalg(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] sampler_output = self.sampler(logits=logits,
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[2025-05-20T10:47:04Z] , numpy.fft._pocketfft_internal(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
[2025-05-20T10:47:04Z] , numpy.random._common(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[2025-05-20T10:47:04Z] , numpy.random.bit_generator(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
[2025-05-20T10:47:04Z] , (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] numpy.random._bounded_integers(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 49, in forward
[2025-05-20T10:47:04Z] , numpy.random._mt19937(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] sampled = self.sample(logits, sampling_metadata)
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] , (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 125, in sample
[2025-05-20T10:47:04Z] numpy.random.mtrand(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] sampled = torch.where(
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] ^^^^^^^^^^^^
[2025-05-20T10:47:04Z] , (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-20T10:47:04Z] numpy.random._philox(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T10:47:04Z] , numpy.random._pcg64(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T10:47:04Z] , numpy.random._sfc64(VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]
[2025-05-20T10:47:04Z] (VllmWorker rank=0 pid=12189) ERROR 05-20 03:47:04 [multiproc_executor.py:522]
[2025-05-20T10:47:04Z] , numpy.random._generator, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, yaml._yaml, PIL._imaging, markupsafe._speedups, sklearn.__check_build._check_build, psutil._psutil_linux, psutil._psutil_posix, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._cdflib, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.stats._unuran.unuran_wrapper, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, numexpr.interpreter, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, zmq.backend.cython._zmq, PIL._imagingft, hiredis.hiredis, msgspec._core, _cffi_backend, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._helpers, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket, frozenlist._frozenlist, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, sentencepiece._sentencepiece, vllm.cumem_allocator, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.typing.builtins.itertools, numba.cpython.builtins.math, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, cuda_utils, __triton_launcher (total: 231)
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] EngineCore failed to start.
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] Traceback (most recent call last):
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 480, in run_engine_core
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] engine_core = EngineCoreProc(*args, **kwargs)
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 379, in __init__
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] super().__init__(vllm_config, executor_class, log_stats,
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 74, in __init__
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] self._initialize_kv_caches(vllm_config)
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 133, in _initialize_kv_caches
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] available_gpu_memory = self.model_executor.determine_available_memory()
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in determine_available_memory
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] output = self.collective_rpc("determine_available_memory")
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] result = get_response(w, dequeue_timeout)
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 202, in get_response
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] raise RuntimeError(
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] RuntimeError: Worker failed with error 'CUDA error: an illegal memory access was encountered
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T10:47:04Z] ERROR 05-20 03:47:04 [core.py:489] ', please check the stack trace above for the root cause
[2025-05-20T10:47:07Z] DEBUG 05-20 03:47:07 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:47:17Z] DEBUG 05-20 03:47:17 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:47:27Z] DEBUG 05-20 03:47:27 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:47:37Z] DEBUG 05-20 03:47:37 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:47:47Z] DEBUG 05-20 03:47:47 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:47:57Z] DEBUG 05-20 03:47:57 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:48:04Z] (VllmWorker rank=1 pid=12191) DEBUG 05-20 03:48:04 [shm_broadcast.py:430] No available shared memory broadcast block found in 60 second.
[2025-05-20T10:48:08Z] DEBUG 05-20 03:48:08 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:48:18Z] DEBUG 05-20 03:48:18 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:48:28Z] DEBUG 05-20 03:48:28 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:48:38Z] DEBUG 05-20 03:48:38 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:48:48Z] DEBUG 05-20 03:48:48 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:48:58Z] DEBUG 05-20 03:48:58 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:49:04Z] (VllmWorker rank=1 pid=12191) DEBUG 05-20 03:49:04 [shm_broadcast.py:430] No available shared memory broadcast block found in 60 second.
[2025-05-20T10:49:08Z] DEBUG 05-20 03:49:08 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:49:18Z] DEBUG 05-20 03:49:18 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:49:28Z] DEBUG 05-20 03:49:28 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:49:38Z] DEBUG 05-20 03:49:38 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:49:48Z] DEBUG 05-20 03:49:48 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:49:58Z] DEBUG 05-20 03:49:58 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:50:04Z] (VllmWorker rank=1 pid=12191) DEBUG 05-20 03:50:04 [shm_broadcast.py:430] No available shared memory broadcast block found in 60 second.
[2025-05-20T10:50:08Z] DEBUG 05-20 03:50:08 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:50:18Z] DEBUG 05-20 03:50:18 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:50:28Z] DEBUG 05-20 03:50:28 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:50:38Z] DEBUG 05-20 03:50:38 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:50:48Z] DEBUG 05-20 03:50:48 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:50:58Z] DEBUG 05-20 03:50:58 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:51:04Z] (VllmWorker rank=1 pid=12191) DEBUG 05-20 03:51:04 [shm_broadcast.py:430] No available shared memory broadcast block found in 60 second.
[2025-05-20T10:51:08Z] DEBUG 05-20 03:51:08 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:51:18Z] DEBUG 05-20 03:51:18 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:51:28Z] DEBUG 05-20 03:51:28 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:51:38Z] DEBUG 05-20 03:51:38 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:51:48Z] DEBUG 05-20 03:51:48 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:51:58Z] DEBUG 05-20 03:51:58 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:52:04Z] (VllmWorker rank=1 pid=12191) DEBUG 05-20 03:52:04 [shm_broadcast.py:430] No available shared memory broadcast block found in 60 second.
[2025-05-20T10:52:05Z] [rank1]:[W520 03:52:05.308296965 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=135, addr=[localhost]:59310, remote=[localhost]:44713): Connection reset by peer
[2025-05-20T10:52:05Z] Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:675 (most recent call first):
[2025-05-20T10:52:05Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:52:05Z] frame #1: <unknown function> + 0x5ba8afe (0x7f8116a3cafe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:05Z] frame #2: <unknown function> + 0x5baaecf (0x7f8116a3eecf in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:05Z] frame #3: <unknown function> + 0x5bab74a (0x7f8116a3f74a in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:05Z] frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x2a9 (0x7f8116a391a9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:05Z] frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7f80c2699989 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:52:05Z] frame #6: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:52:05Z] frame #7: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:05Z] frame #8: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:05Z]
[2025-05-20T10:52:05Z] [rank1]:[W520 03:52:05.311880662 ProcessGroupNCCL.cpp:1659] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Connection reset by peer
[2025-05-20T10:52:05Z] ERROR 05-20 03:52:05 [multiproc_executor.py:135] Worker proc VllmWorker-0 died unexpectedly, shutting down executor.
[2025-05-20T10:52:06Z] [rank1]:[W520 03:52:06.312051397 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=135, addr=[localhost]:59310, remote=[localhost]:44713): Broken pipe
[2025-05-20T10:52:06Z] Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
[2025-05-20T10:52:06Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:52:06Z] frame #1: <unknown function> + 0x5ba8afe (0x7f8116a3cafe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:06Z] frame #2: <unknown function> + 0x5baa358 (0x7f8116a3e358 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:06Z] frame #3: <unknown function> + 0x5babb3e (0x7f8116a3fb3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:06Z] frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7f8116a39198 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:06Z] frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7f80c2699989 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:52:06Z] frame #6: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:52:06Z] frame #7: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:06Z] frame #8: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:06Z]
[2025-05-20T10:52:06Z] [rank1]:[W520 03:52:06.315138190 ProcessGroupNCCL.cpp:1659] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[2025-05-20T10:52:07Z] [rank1]:[W520 03:52:07.315243724 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=135, addr=[localhost]:59310, remote=[localhost]:44713): Broken pipe
[2025-05-20T10:52:07Z] Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
[2025-05-20T10:52:07Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:52:07Z] frame #1: <unknown function> + 0x5ba8afe (0x7f8116a3cafe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:07Z] frame #2: <unknown function> + 0x5baa358 (0x7f8116a3e358 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:07Z] frame #3: <unknown function> + 0x5babb3e (0x7f8116a3fb3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:07Z] frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7f8116a39198 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:07Z] frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7f80c2699989 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:52:07Z] frame #6: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:52:07Z] frame #7: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:07Z] frame #8: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:07Z]
[2025-05-20T10:52:07Z] [rank1]:[W520 03:52:07.317919386 ProcessGroupNCCL.cpp:1659] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[2025-05-20T10:52:08Z] DEBUG 05-20 03:52:08 [core_client.py:476] Waiting for 1 local, 0 remote core engine proc(s) to start.
[2025-05-20T10:52:08Z] [rank1]:[W520 03:52:08.318060510 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=135, addr=[localhost]:59310, remote=[localhost]:44713): Broken pipe
[2025-05-20T10:52:08Z] Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
[2025-05-20T10:52:08Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:52:08Z] frame #1: <unknown function> + 0x5ba8afe (0x7f8116a3cafe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:08Z] frame #2: <unknown function> + 0x5baa358 (0x7f8116a3e358 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:08Z] frame #3: <unknown function> + 0x5babb3e (0x7f8116a3fb3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:08Z] frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7f8116a39198 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:08Z] frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7f80c2699989 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:52:08Z] frame #6: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:52:08Z] frame #7: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:08Z] frame #8: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:08Z]
[2025-05-20T10:52:08Z] [rank1]:[W520 03:52:08.321126013 ProcessGroupNCCL.cpp:1659] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[2025-05-20T10:52:09Z] [rank1]:[W520 03:52:09.321236826 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=135, addr=[localhost]:59310, remote=[localhost]:44713): Broken pipe
[2025-05-20T10:52:09Z] Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
[2025-05-20T10:52:09Z] frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f81329785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
[2025-05-20T10:52:09Z] frame #1: <unknown function> + 0x5ba8afe (0x7f8116a3cafe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:09Z] frame #2: <unknown function> + 0x5baa358 (0x7f8116a3e358 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:09Z] frame #3: <unknown function> + 0x5babb3e (0x7f8116a3fb3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:09Z] frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7f8116a39198 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
[2025-05-20T10:52:09Z] frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7f80c2699989 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
[2025-05-20T10:52:09Z] frame #6: <unknown function> + 0xdc253 (0x7f80b29b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
[2025-05-20T10:52:09Z] frame #7: <unknown function> + 0x94ac3 (0x7f81b3fb9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:09Z] frame #8: clone + 0x44 (0x7f81b404aa04 in /lib/x86_64-linux-gnu/libc.so.6)
[2025-05-20T10:52:09Z]
[2025-05-20T10:52:09Z] [rank1]:[W520 03:52:09.323951540 ProcessGroupNCCL.cpp:1659] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[2025-05-20T10:52:09Z] Process EngineCore_0:
[2025-05-20T10:52:09Z] Traceback (most recent call last):
[2025-05-20T10:52:09Z] File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
[2025-05-20T10:52:09Z] self.run()
[2025-05-20T10:52:09Z] File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
[2025-05-20T10:52:09Z] self._target(*self._args, **self._kwargs)
[2025-05-20T10:52:09Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 493, in run_engine_core
[2025-05-20T10:52:09Z] raise e
[2025-05-20T10:52:09Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 480, in run_engine_core
[2025-05-20T10:52:09Z] engine_core = EngineCoreProc(*args, **kwargs)
[2025-05-20T10:52:09Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:52:09Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 379, in __init__
[2025-05-20T10:52:09Z] super().__init__(vllm_config, executor_class, log_stats,
[2025-05-20T10:52:09Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 74, in __init__
[2025-05-20T10:52:09Z] self._initialize_kv_caches(vllm_config)
[2025-05-20T10:52:09Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 133, in _initialize_kv_caches
[2025-05-20T10:52:09Z] available_gpu_memory = self.model_executor.determine_available_memory()
[2025-05-20T10:52:09Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:52:09Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 75, in determine_available_memory
[2025-05-20T10:52:09Z] output = self.collective_rpc("determine_available_memory")
[2025-05-20T10:52:09Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:52:09Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc
[2025-05-20T10:52:09Z] result = get_response(w, dequeue_timeout)
[2025-05-20T10:52:09Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-20T10:52:09Z] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 202, in get_response
[2025-05-20T10:52:09Z] raise RuntimeError(
[2025-05-20T10:52:09Z] RuntimeError: Worker failed with error 'CUDA error: an illegal memory access was encountered
[2025-05-20T10:52:09Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-20T10:52:09Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-20T10:52:09Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-05-20T10:52:09Z] ', please check the stack trace above for the root cause
[2025-05-20T10:52:10Z] /usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 3 leaked shared_memory objects to clean up at shutdown
[2025-05-20T10:52:10Z] warnings.warn('resource_tracker: There appear to be %d '
[2025-05-20T10:52:10Z] F
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] =================================== FAILURES ===================================
[2025-05-20T10:52:10Z] _____________________________ test_weight_loading ______________________________
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] vllm_runner = <class 'tests.conftest.VllmRunner'>
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] @pytest.mark.skipif(
[2025-05-20T10:52:10Z] MODEL_NAME == "casperhansen/deepseek-coder-v2-instruct-awq",
[2025-05-20T10:52:10Z] reason="OOM in the CI")
[2025-05-20T10:52:10Z] @pytest.mark.skipif(
[2025-05-20T10:52:10Z] not current_platform.has_device_capability(int(MIN_CAPABILITY)),
[2025-05-20T10:52:10Z] reason="Current system does not have minimum capability.")
[2025-05-20T10:52:10Z] def test_weight_loading(vllm_runner):
[2025-05-20T10:52:10Z] """
[2025-05-20T10:52:10Z] Test parameter weight loading with tp>1.
[2025-05-20T10:52:10Z] """
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] # MoE models need fp16.
[2025-05-20T10:52:10Z] NEEDS_FP16 = (QUANTIZATION == "gptq" or MODEL_NAME
[2025-05-20T10:52:10Z] == "nm-testing/test-w4a16-mixtral-actorder-group")
[2025-05-20T10:52:10Z] > with vllm_runner(
[2025-05-20T10:52:10Z] model_name=MODEL_NAME,
[2025-05-20T10:52:10Z] revision=REVISION,
[2025-05-20T10:52:10Z] dtype=torch.half if NEEDS_FP16 else "auto",
[2025-05-20T10:52:10Z] quantization=None if QUANTIZATION == "None" else QUANTIZATION,
[2025-05-20T10:52:10Z] max_model_len=MAX_MODEL_LEN,
[2025-05-20T10:52:10Z] tensor_parallel_size=2) as model:
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] weight_loading/test_weight_loading.py:32:
[2025-05-20T10:52:10Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2025-05-20T10:52:10Z] conftest.py:762: in __init__
[2025-05-20T10:52:10Z] self.model = LLM(
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/utils.py:1177: in inner
[2025-05-20T10:52:10Z] return fn(*args, **kwargs)
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/entrypoints/llm.py:250: in __init__
[2025-05-20T10:52:10Z] self.llm_engine = LLMEngine.from_engine_args(
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py:511: in from_engine_args
[2025-05-20T10:52:10Z] return engine_cls.from_vllm_config(
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:115: in from_vllm_config
[2025-05-20T10:52:10Z] return cls(vllm_config=vllm_config,
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/llm_engine.py:92: in __init__
[2025-05-20T10:52:10Z] self.engine_core = EngineCoreClient.make_client(
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:75: in make_client
[2025-05-20T10:52:10Z] return SyncMPClient(vllm_config, executor_class, log_stats)
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:580: in __init__
[2025-05-20T10:52:10Z] super().__init__(
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:418: in __init__
[2025-05-20T10:52:10Z] self._wait_for_engine_startup(output_address, parallel_config)
[2025-05-20T10:52:10Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] self = <vllm.v1.engine.core_client.SyncMPClient object at 0x7f7f5ea652e0>
[2025-05-20T10:52:10Z] output_address = 'ipc:///tmp/d003abd2-4e16-42b2-9050-bf1e9dc8d357'
[2025-05-20T10:52:10Z] parallel_config = ParallelConfig(pipeline_parallel_size=1, tensor_parallel_size=2, data_parallel_size=1, data_parallel_size_local=1, dat...p', worker_cls='vllm.v1.worker.gpu_worker.Worker', sd_worker_cls='auto', worker_extension_cls='', world_size=2, rank=0)
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] def _wait_for_engine_startup(self, output_address: str,
[2025-05-20T10:52:10Z] parallel_config: ParallelConfig):
[2025-05-20T10:52:10Z] # Get a sync handle to the socket which can be sync or async.
[2025-05-20T10:52:10Z] sync_input_socket = zmq.Socket.shadow(self.input_socket)
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] # Wait for engine core process(es) to send ready messages.
[2025-05-20T10:52:10Z] local_count = parallel_config.data_parallel_size_local
[2025-05-20T10:52:10Z] remote_count = len(self.core_engines) - local_count
[2025-05-20T10:52:10Z] # [local, remote] counts
[2025-05-20T10:52:10Z] conn_pending, start_pending = [local_count, remote_count], [0, 0]
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] poller = zmq.Poller()
[2025-05-20T10:52:10Z] poller.register(sync_input_socket, zmq.POLLIN)
[2025-05-20T10:52:10Z] proc_manager = self.resources.local_engine_manager
[2025-05-20T10:52:10Z] if proc_manager is not None:
[2025-05-20T10:52:10Z] for sentinel in proc_manager.sentinels():
[2025-05-20T10:52:10Z] poller.register(sentinel, zmq.POLLIN)
[2025-05-20T10:52:10Z] while any(conn_pending) or any(start_pending):
[2025-05-20T10:52:10Z] events = poller.poll(STARTUP_POLL_PERIOD_MS)
[2025-05-20T10:52:10Z] if not events:
[2025-05-20T10:52:10Z] if any(conn_pending):
[2025-05-20T10:52:10Z] logger.debug(
[2025-05-20T10:52:10Z] "Waiting for %d local, %d remote core engine proc(s) "
[2025-05-20T10:52:10Z] "to connect.", *conn_pending)
[2025-05-20T10:52:10Z] if any(start_pending):
[2025-05-20T10:52:10Z] logger.debug(
[2025-05-20T10:52:10Z] "Waiting for %d local, %d remote core engine proc(s) "
[2025-05-20T10:52:10Z] "to start.", *start_pending)
[2025-05-20T10:52:10Z] continue
[2025-05-20T10:52:10Z] if len(events) > 1 or events[0][0] != sync_input_socket:
[2025-05-20T10:52:10Z] # One of the local core processes exited.
[2025-05-20T10:52:10Z] finished = proc_manager.finished_procs(
[2025-05-20T10:52:10Z] ) if proc_manager else {}
[2025-05-20T10:52:10Z] > raise RuntimeError("Engine core initialization failed. "
[2025-05-20T10:52:10Z] "See root cause above. "
[2025-05-20T10:52:10Z] f"Failed core proc(s): {finished}")
[2025-05-20T10:52:10Z] E RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_0': 1}
[2025-05-20T10:52:10Z]
[2025-05-20T10:52:10Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:484: RuntimeError
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingci-failureIssue about an unexpected test failure in CIIssue about an unexpected test failure in CI
Type
Projects
Status
Done