-
Notifications
You must be signed in to change notification settings - Fork 36
Description
Hi, thanks for sharing your great work. I have encountered this keyerror issue after following your installation steps. This reported error is as follows:
(WorkerDict pid=595461) [rank0]: File "/mnt/anaconda3_new/envs/deepeyes/lib/python3.10/site-packages/ray/_private/function_manager.py", line 689, in actor_method_executor
(WorkerDict pid=595461) [rank0]: return method(__ray_actor, *args, **kwargs)
(WorkerDict pid=595461) [rank0]: File "/mnt/anaconda3_new/envs/deepeyes/lib/python3.10/site-packages/ray/util/tracing/tracing_helper.py", line 463, in _resume_span
(WorkerDict pid=595461) [rank0]: return method(self, *_args, **_kwargs)
(WorkerDict pid=595461) [rank0]: File "/mnt/DeepEyes/verl/single_controller/ray/base.py", line 446, in func
(WorkerDict pid=595461) [rank0]: return getattr(self.worker_dict[key], name)(*args, **kwargs)
(WorkerDict pid=595461) [rank0]: File "/mnt/DeepEyes/verl/single_controller/base/decorator.py", line 413, in inner
(WorkerDict pid=595461) [rank0]: return func(*args, **kwargs)
(WorkerDict pid=595461) [rank0]: File "/mnt/DeepEyes/verl/workers/fsdp_workers.py", line 560, in generate_sequences
(WorkerDict pid=595461) [rank0]: with self.rollout_sharding_manager:
(WorkerDict pid=595461) [rank0]: File "/mnt/DeepEyes/verl/utils/debug/performance.py", line 61, in f
(WorkerDict pid=595461) [rank0]: return self.log(decorated_function, *args, **kwargs)
(WorkerDict pid=595461) [rank0]: File "/mnt/DeepEyes/verl/utils/debug/performance.py", line 70, in log
(WorkerDict pid=595461) [rank0]: output = func(*args, **kwargs)
(WorkerDict pid=595461) [rank0]: File "/mnt/DeepEyes/verl/workers/sharding_manager/fsdp_vllm.py", line 119, in __enter__
(WorkerDict pid=595461) [rank0]: self.update_params(params)
(WorkerDict pid=595461) [rank0]: File "/mnt/DeepEyes/verl/workers/sharding_manager/fsdp_vllm.py", line 215, in update_params
(WorkerDict pid=595461) [rank0]: loaded_params = model.load_weights(
(WorkerDict pid=595461) [rank0]: File "/mnt/anaconda3_new/envs/deepeyes/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 1098, in load_weights
(WorkerDict pid=595461) [rank0]: return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(WorkerDict pid=595461) [rank0]: File "/mnt/anaconda3_new/envs/deepeyes/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 235, in load_weights
(WorkerDict pid=595461) [rank0]: autoloaded_weights = set(self._load_module("", self.module, weights))
(WorkerDict pid=595461) [rank0]: File "/mnt/anaconda3_new/envs/deepeyes/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 196, in _load_module
(WorkerDict pid=595461) [rank0]: yield from self._load_module(prefix,
(WorkerDict pid=595461) [rank0]: File "/mnt/anaconda3_new/envs/deepeyes/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 173, in _load_module
(WorkerDict pid=595461) [rank0]: loaded_params = module_load_weights(weights)
(WorkerDict pid=595461) [rank0]: File "/mnt/anaconda3_new/envs/deepeyes/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 490, in load_weights
(WorkerDict pid=595461) [rank0]: return loader.load_weights(weights)
(WorkerDict pid=595461) [rank0]: File "/mnt/anaconda3_new/envs/deepeyes/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 235, in load_weights
(WorkerDict pid=595461) [rank0]: autoloaded_weights = set(self._load_module("", self.module, weights))
(WorkerDict pid=595461) [rank0]: File "/mnt/anaconda3_new/envs/deepeyes/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 196, in _load_module
(WorkerDict pid=595461) [rank0]: yield from self._load_module(prefix,
(WorkerDict pid=595461) [rank0]: File "/mnt/anaconda3_new/envs/deepeyes/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 173, in _load_module
(WorkerDict pid=595461) [rank0]: loaded_params = module_load_weights(weights)
(WorkerDict pid=595461) [rank0]: File "/mnt/anaconda3_new/envs/deepeyes/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 400, in load_weights
(WorkerDict pid=595461) [rank0]: param = params_dict[name]
(WorkerDict pid=595461) [rank0]: KeyError: 'visual.patch_embed.proj.weight
It seems the issue comes from verl/workers/sharding_manager/fsdp_vllm.py
line109: self.update_params(params)
Further, it goes into vllm/model_executor/models/qwen2.py: load_weights(self, weights) of Qwen2Model
.
I have checked the params_dict
and weights_dict
and they are indeed different.
I have tried these:
- replacing the
self.update_params
with:
from verl.third_party.vllm import load_dtensor_weights
load_dtensor_weights(
params, self.inference_engine.llm_engine.model_executor.driver_worker.worker.model_runner.model)
which was implemented in an older version of verl. After doing so, the keyerror is handled but the vllm_engine will generate random meaningless sequences so I guess the weight loading is still wrong.
2. change vllm==0.8.2
to vllm==0.8.0
. It does not help.
Have you ever encountered similar issue and can you give some advice to deal with this issue?