File tree Expand file tree Collapse file tree 2 files changed +7
-0
lines changed
Expand file tree Collapse file tree 2 files changed +7
-0
lines changed Original file line number Diff line number Diff line change @@ -18,3 +18,8 @@ See the supported models [here](https://vllm.readthedocs.io/en/latest/models/sup
1818 ```
1919 python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-7b-v1.3 --tokenizer hf-internal-testing/llama-tokenizer
2020 ```
21+
22+ if you use a awq model, try
23+ '''
24+ python3 -m fastchat.serve.vllm_worker --model-path TheBloke/vicuna-7B-v1.5-AWQ --quantization awq
25+ '''
Original file line number Diff line number Diff line change @@ -210,6 +210,8 @@ async def api_model_details(request: Request):
210210 args .model = args .model_path
211211 if args .num_gpus > 1 :
212212 args .tensor_parallel_size = args .num_gpus
213+ if args .quantizaiton :
214+ args .quantization = args .quantization
213215
214216 engine_args = AsyncEngineArgs .from_cli_args (args )
215217 engine = AsyncLLMEngine .from_engine_args (engine_args )
You can’t perform that action at this time.
0 commit comments