-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Description
Your current environment
The output of `python collect_env.py`
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.30.2
Libc version: glibc-2.31
Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-1066-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA L4
GPU 1: NVIDIA L4
GPU 2: NVIDIA L4
GPU 3: NVIDIA L4
Nvidia driver version: 535.183.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.2
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 25
Model: 1
Model name: AMD EPYC 7R13 Processor
Stepping: 1
CPU MHz: 2322.855
BogoMIPS: 5299.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 768 KiB
L1i cache: 768 KiB
L2 cache: 12 MiB
L3 cache: 96 MiB
NUMA node0 CPU(s): 0-47
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Mitigation; safe RET, no microcode
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save vaes vpclmulqdq rdpid
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] pyzmq==26.1.0
[pip3] torch==2.4.0
[pip3] torch-model-archiver==0.11.0
[pip3] torchaudio==2.3.0+cu121
[pip3] torchserve==0.11.0
[pip3] torchtext==0.18.0+cu121
[pip3] torchvision==0.19.0
[pip3] transformers==4.44.0
[pip3] triton==3.0.0
[conda] mkl 2024.0.0 ha957f24_49657 conda-forge
[conda] mkl-include 2024.1.0 ha957f24_693 conda-forge
[conda] numpy 1.26.4 py311h64a7726_0 conda-forge
[conda] nvidia-nccl-cu12 2.20.5 pypi_0 pypi
[conda] pyzmq 26.1.0 pypi_0 pypi
[conda] torch 2.4.0 pypi_0 pypi
[conda] torch-model-archiver 0.11.0 pypi_0 pypi
[conda] torchaudio 2.3.0+cu121 pypi_0 pypi
[conda] torchserve 0.11.0 pypi_0 pypi
[conda] torchtext 0.18.0+cu121 pypi_0 pypi
[conda] torchvision 0.19.0 pypi_0 pypi
[conda] transformers 4.44.0 pypi_0 pypi
[conda] triton 3.0.0 pypi_0 pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.5.4@4db5176d9758b720b05460c50ace3c01026eb158
vLLM Build Flags:
CUDA Archs: 5.0 7.0+PTX 7.5+PTX 8.0 8.6 9.0; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X SYS SYS SYS 0-47 0 N/A
GPU1 SYS X SYS SYS 0-47 0 N/A
GPU2 SYS SYS X SYS 0-47 0 N/A
GPU3 SYS SYS SYS X 0-47 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
🐛 Describe the bug
Trying to serve mixtral 7x8 awq model with 4 gpu node and wanted to publish two instances of that same model. in that node. My understanding from the documnets was we can use --tensor-parallel-size 2 --pipeline-parallel-size 2 to distribute these. My expectation was from this was that it will have two instances of this model using two gpu each but its throwing me this error.
(VllmWorkerProcess pid=138) INFO 08-13 15:22:51 model_runner.py:732] Loading model weights took 11.4953 GB
(VllmWorkerProcess pid=137) INFO 08-13 15:22:51 model_runner.py:732] Loading model weights took 11.4953 GB
(VllmWorkerProcess pid=139) INFO 08-13 15:22:52 model_runner.py:732] Loading model weights took 11.4953 GB
INFO 08-13 15:22:52 model_runner.py:732] Loading model weights took 11.4953 GB
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: 'MixtralForCausalLM' object has no attribute 'make_empty_intermediate_tensors', Traceback (most recent call last):
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 936, in profile_run
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] intermediate_tensors = self.model.make_empty_intermediate_tensors(
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1729, in getattr
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] AttributeError: 'MixtralForCausalLM' object has no attribute 'make_empty_intermediate_tensors'
(VllmWorkerProcess pid=139) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: 'MixtralForCausalLM' object has no attribute 'make_empty_intermediate_tensors', Traceback (most recent call last):
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 936, in profile_run
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] intermediate_tensors = self.model.make_empty_intermediate_tensors(
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1729, in getattr
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226] AttributeError: 'MixtralForCausalLM' object has no attribute 'make_empty_intermediate_tensors'
(VllmWorkerProcess pid=138) ERROR 08-13 15:22:52 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: list index out of range, Traceback (most recent call last):
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 940, in profile_run
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1363, in execute_model
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/vllm/model_executor/models/mixtral_quant.py", line 361, in forward
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] hidden_states = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] File "/opt/conda/lib/python3.11/site-packages/vllm/model_executor/models/mixtral_quant.py", line 328, in forward
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] kv_caches[i], attn_metadata,
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] ~~~~~~~~~^^^
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226] IndexError: list index out of range
(VllmWorkerProcess pid=137) ERROR 08-13 15:23:05 multiproc_worker_utils.py:226]
/opt/conda/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
ERROR 08-13 15:23:05 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 138 died, exit code: -15
INFO 08-13 15:23:05 multiproc_worker_utils.py:123] Killing local vLLM worker processes
Process Process-1:
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/conda/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.11/site-packages/vllm/entrypoints/openai/rpc/server.py", line 217, in run_rpc_server
server = AsyncEngineRPCServer(async_engine_args, usage_context, port)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/vllm/entrypoints/openai/rpc/server.py", line 25, in init
self.engine = AsyncLLMEngine.from_engine_args(async_engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 471, in from_engine_args
engine = cls(
^^^^
File "/opt/conda/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 381, in init
self.engine = self._init_engine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 552, in _init_engine
return engine_class(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 263, in init
self._initialize_kv_caches()
File "/opt/conda/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 362, in _initialize_kv_caches
self.model_executor.determine_num_available_blocks())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/vllm/executor/distributed_gpu_executor.py", line 38, in determine_num_available_blocks
num_blocks = self._run_workers("determine_num_available_blocks", )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 192, in _run_workers
driver_worker_output = driver_worker_method(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
self.model_runner.profile_run()
File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 940, in profile_run
self.execute_model(model_input, kv_caches, intermediate_tensors)
File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1363, in execute_model
hidden_or_intermediate_states = model_executable(
^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/vllm/model_executor/models/mixtral_quant.py", line 361, in forward
hidden_states = self.model(input_ids, positions, kv_caches,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/vllm/model_executor/models/mixtral_quant.py", line 328, in forward
kv_caches[i], attn_metadata,
~~~~~~~~~^^^
IndexError: list index out of range
/opt/conda/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '