Skip to content

Error in newest dev ver: Worker.__init__() got an unexpected keyword argument 'cache_config' #2640

@yippp

Description

@yippp

I build the newest master branch with #2279 commit.
And I run the following command
python -m vllm.entrypoints.openai.api_server --model ./Mistral-7B-Instruct-v0.2-AWQ --quantization awq --dtype auto --host 0.0.0.0 --port 8081 --tensor-parallel-size 2
I meet the error:


INFO 01-29 09:41:47 api_server.py:209] args: Namespace(host='0.0.0.0', port=8081, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, root_path=None, middleware=[], model='./Mistral-7B-Instruct-v0.2-AWQ', tokenizer=None, revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', max_model_len=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, block_size=16, seed=0, swap_space=4, gpu_memory_utilization=0.9, max_num_batched_tokens=None, max_num_seqs=256, max_paddings=256, disable_log_stats=False, quantization='awq', enforce_eager=False, max_context_len_to_capture=8192, disable_custom_all_reduce=False, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
WARNING 01-29 09:41:47 config.py:177] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models.
2024-01-29 09:41:49,090 INFO worker.py:1724 -- Started a local Ray instance.
INFO 01-29 09:41:50 llm_engine.py:72] Initializing an LLM engine with config: model='./Mistral-7B-Instruct-v0.2-AWQ', tokenizer='./Mistral-7B-Instruct-v0.2-AWQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=awq, enforce_eager=False, kv_cache_dtype=auto, seed=0)
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/my/vllm/vllm/entrypoints/openai/api_server.py", line 217, in <module>
    engine = AsyncLLMEngine.from_engine_args(engine_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/my/vllm/vllm/engine/async_llm_engine.py", line 615, in from_engine_args
    engine = cls(parallel_config.worker_use_ray,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/my/vllm/vllm/engine/async_llm_engine.py", line 319, in __init__
    self.engine = self._init_engine(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/my/vllm/vllm/engine/async_llm_engine.py", line 364, in _init_engine
    return engine_class(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/my/vllm/vllm/engine/llm_engine.py", line 109, in __init__
    self._init_workers_ray(placement_group)
  File "/home/my/vllm/vllm/engine/llm_engine.py", line 260, in _init_workers_ray
    self.driver_worker = Worker(
                         ^^^^^^^
TypeError: Worker.__init__() got an unexpected keyword argument 'cache_config'
2024-01-29 09:41:54,676 ERROR worker.py:405 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::RayWorkerVllm.init_worker() (pid=3160378, ip=10.20.4.57, actor_id=ca7bf2aa56e3f1a0c1a7678201000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7f043133e7d0>)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/my/vllm/vllm/engine/ray_utils.py", line 23, in init_worker
    self.worker = worker_init_fn()
                  ^^^^^^^^^^^^^^^^
  File "/home/my/vllm/vllm/engine/llm_engine.py", line 247, in <lambda>
    lambda rank=rank, local_rank=local_rank: Worker(
                                             ^^^^^^^
TypeError: Worker.__init__() got an unexpected keyword argument 'cache_config'

I am running with python=3.11, CUDA 12.1, driver 530 with 2x RTX 3090 NVLink.
I notice there is a discussion (#2279 (comment)) about cache_config, I am not sure whether it is related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions